Download Raw Diff

Details

Reviewers

spatel
andreadb
mkuper
hfinkel

Commits

rGbca02f9e2014: [CostModel][X86] Add support for broadcast shuffle costs
rL291122: [CostModel][X86] Add support for broadcast shuffle costs

Summary

Currently only for broadcasts with input and output of the same width.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 81588.Dec 15 2016, 8:06 AM

RKSimon retitled this revision from to [CostModel][X86] Add support for broadcast shuffle costs.

RKSimon updated this object.

RKSimon added reviewers: mkuper, hfinkel, spatel, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

Hi Simon,

I noticed that getShuffleCost is becoming quite big.
Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

lib/Target/X86/X86TargetTransformInfo.cpp
616	Shouldn't this be `LT.first * Entry->Cost` ? I think that you are not accounting for the type legalization cost.
628	Same.
642	Same (also, see lines 654, 664, 677 and 686).
681–688	Can we not just simplify this code into something like this? if (ST->hasSSE1() && LT.second == MVT::v4f32) return LT.first; You are basically performing a lookup on a table with just a single entry.

In D27811#623731, @andreadb wrote:

I noticed that getShuffleCost is becoming quite big.

You're not kidding ;-)

Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?

lib/Target/X86/X86TargetTransformInfo.cpp
616	That's what I meant in the disclaimer at the top - as we're broadcasting we only reference the first input register and all the outputs are the same - so the costs aren't multiplied by the LT.first scale factor (num vectors). It doesn't account for any register moves that occur but that's true for most throughput costs.

mkuper added inline comments.Dec 15 2016, 9:24 AM

lib/Analysis/CostModel.cpp
93 ↗	(On Diff #81588)	We already have this helper in CGP (that version also checks if you're splatting any element, not just element 0.) Maybe make it common? Not entirely sure what the appropriate place for it yes, though. Would it make sense for CGP to use CostModel?

In D27811#623755, @RKSimon wrote:

In D27811#623731, @andreadb wrote:

I noticed that getShuffleCost is becoming quite big.

You're not kidding ;-)

Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?

I think it is a good idea :-). After all, the opcode can only be ISD::VECTOR_SHUFFLE in this method. So, that bit of information doesn't really need to be stored in any entries.

lib/Target/X86/X86TargetTransformInfo.cpp
616	Ah I see. That makes sense.

RKSimon added inline comments.Dec 15 2016, 10:09 AM

lib/Analysis/CostModel.cpp
93 ↗	(On Diff #81588)	Adding isSplatMask/isSplat/getSplatIndex support to ShuffleVectorInst (worth adding them to all to match ShuffleVectorSDNode ?) would be trivial, then both CodeGenPrepare and CostModel could use it easily. It would probably be worth upgrading SK_Broadcast at the same time to support broadcasting any Index value (not just 0) - given that almost nothing uses it so far that shouldn't be a problem. What other cost model cases did you have in mind? CodeGenPrepare::optimizeShuffleVectorInst seems quite limited.

mkuper added inline comments.Dec 15 2016, 11:44 AM

lib/Analysis/CostModel.cpp
93 ↗	(On Diff #81588)	It would probably be worth upgrading SK_Broadcast at the same time to support broadcasting any Index value (not just 0) - given that almost nothing uses it so far that shouldn't be a problem. Thinking about it a bit more, I'm not entirely sure about this. Ignoring what we do in the DAG for a moment, in IR, I'd expect the canonical insert + splat pattern to use index 0. What other cost model cases did you have in mind? CodeGenPrepare::optimizeShuffleVectorInst seems quite limited. I didn't, really, just trying to avoid code duplication.

Added ShuffleVectorInst::isSplat as suggested.

Regarding the refactor to use TTI::ShuffleKind instead of ISD::SHUFFLE_VECTOR in the LUTs - is everyone happy with me to commit this to trunk and I'll then update this patch with the new scheme?

RKSimon mentioned this in D28118: AVX-512 cost calculation for interleave load/store patterns.Dec 29 2016, 2:56 AM

delena added a subscriber: delena.Dec 29 2016, 3:19 AM

In D27811#623924, @andreadb wrote:

In D27811#623755, @RKSimon wrote:

In D27811#623731, @andreadb wrote:

I noticed that getShuffleCost is becoming quite big.

You're not kidding ;-)

Do you think it makes sense to split the logic in getShuffleCost in three parts (a function for each supported ShuffleKind)?. In case, getShuffleCost could be refactored in a separate commit.

I have considered hijacking the 'ISD::VECTOR_SHUFFLE' int entry in both CostTbleEmtry and CostTableLookup to do lookup based on TTI::ShuffleKind instead - what do you think?

I think it is a good idea :-). After all, the opcode can only be ISD::VECTOR_SHUFFLE in this method. So, that bit of information doesn't really need to be stored in any entries.

May I ask you to postpone refactoring of getShuffleCost() to the next commit? Andrea, I have another patch https://reviews.llvm.org/D28118 that also changes ShuffleCost for X86 targets.

mssimpso added a subscriber: mssimpso.Dec 29 2016, 4:03 AM

In D27811#632164, @delena wrote:

May I ask you to postpone refactoring of getShuffleCost() to the next commit? Andrea, I have another patch https://reviews.llvm.org/D28118 that also changes ShuffleCost for X86 targets.

No problem, I'll do the cleanup/refactor after your interleaving patch

delena added inline comments.Jan 2 2017, 1:04 AM

lib/Analysis/CostModel.cpp
498 ↗	(On Diff #82260)	I suggest to simplify the code - just one function: static bool isZeroEltBroadcastVectorMask(SmallVectorImpl<int> &Mask) { for (unsigned i = 0; i < Mask.size(); ++i) if (Mask[i] > 0) return false; return true; }

RKSimon mentioned this in rL290956: [X86] Merged Reverse/Alternate shuffle cost tables. NFCI..Jan 4 2017, 4:19 AM

Updated having merged the Broadcast/Alternate/Reverse shuffle costs into a single set of LUTs.

andreadb added inline comments.Jan 4 2017, 10:59 AM

lib/Analysis/CostModel.cpp
520–522 ↗	(On Diff #83079)	Is this code still needed? r290810 introduced a check for `isZeroEltBroadcastVectorMask' at line 530.

RKSimon added inline comments.Jan 5 2017, 3:28 AM

lib/Analysis/CostModel.cpp
520–522 ↗	(On Diff #83079)	Thanks - I missed that for some reason. What should we do? Keep with the separate Broadcast matching helpers or merge and use ShuffleVectorInst::isSplat ?

andreadb added inline comments.Jan 5 2017, 3:53 AM

lib/Analysis/CostModel.cpp
520–522 ↗	(On Diff #83079)	Good question. I don't have a strong opinion since both approaches sound reasonable to me. In general, I like the idea of having separate helpers to check shuffle masks, and I am not particularly worried about the small code duplication in `isZeroEltBroadcastVectorMask` (since that helper is very simple).

delena added inline comments.Jan 5 2017, 4:15 AM

lib/Target/X86/X86TargetTransformInfo.cpp
686	theoretically, v16i16 should not be more expensive than v32i8, vpshufb should work for v8i16 as it works for v16i8. (I don't know how it is implemented).

RKSimon added inline comments.Jan 5 2017, 6:30 AM

lib/Target/X86/X86TargetTransformInfo.cpp
686	It does currently have the vpshuflw + vpshufd + vinsertf128 pattern - I'll look improving this in shuffle combining.

Use existing broadcast shuffle matching

LGTM. Thanks Simon!

This revision is now accepted and ready to land.Jan 5 2017, 7:20 AM

Closed by commit rL291122: [CostModel][X86] Add support for broadcast shuffle costs (authored by RKSimon). · Explain WhyJan 5 2017, 8:06 AM

This revision was automatically updated to reflect the committed changes.

Diff 83232

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 599 Lines • ▼ Show 20 Lines
}		}

int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,		int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) {		Type *SubTp) {
// 64-bit packed float vectors (v2f32) are widened to type v4f32.		// 64-bit packed float vectors (v2f32) are widened to type v4f32.
// 64-bit packed integer vectors (v2i32) are promoted to type v2i64.		// 64-bit packed integer vectors (v2i32) are promoted to type v2i64.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);

if (Kind == TTI::SK_Reverse \|\| Kind == TTI::SK_Alternate) {		if (Kind == TTI::SK_Reverse \|\| Kind == TTI::SK_Alternate \|\|
		Kind == TTI::SK_Broadcast) {
		// For Broadcasts we are splatting the first element from the first input
		// register, so only need to reference that input and all the output
		// registers are the same.
		if (Kind == TTI::SK_Broadcast)
		LT.first = 1;

static const CostTblEntry AVX512VBMIShuffleTbl[] = {		static const CostTblEntry AVX512VBMIShuffleTbl[] = {
		andreadbUnsubmitted Not Done Reply Inline Actions Shouldn't this be `LT.first * Entry->Cost` ? I think that you are not accounting for the type legalization cost. andreadb: Shouldn't this be `LT.first * Entry->Cost` ? I think that you are not accounting for the type…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions That's what I meant in the disclaimer at the top - as we're broadcasting we only reference the first input register and all the outputs are the same - so the costs aren't multiplied by the LT.first scale factor (num vectors). It doesn't account for any register moves that occur but that's true for most throughput costs. RKSimon: That's what I meant in the disclaimer at the top - as we're broadcasting we only reference the…
		andreadbUnsubmitted Not Done Reply Inline Actions Ah I see. That makes sense. andreadb: Ah I see. That makes sense.
{ TTI::SK_Reverse, MVT::v64i8, 1 }, // vpermb		{ TTI::SK_Reverse, MVT::v64i8, 1 }, // vpermb
{ TTI::SK_Reverse, MVT::v32i8, 1 } // vpermb		{ TTI::SK_Reverse, MVT::v32i8, 1 } // vpermb
};		};

if (ST->hasVBMI())		if (ST->hasVBMI())
if (const auto *Entry =		if (const auto *Entry =
CostTableLookup(AVX512VBMIShuffleTbl, Kind, LT.second))		CostTableLookup(AVX512VBMIShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry AVX512BWShuffleTbl[] = {		static const CostTblEntry AVX512BWShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v32i16, 1 }, // vpbroadcastw
		{ TTI::SK_Broadcast, MVT::v64i8, 1 }, // vpbroadcastb
		andreadbUnsubmitted Not Done Reply Inline Actions Same. andreadb: Same.

{ TTI::SK_Reverse, MVT::v32i16, 1 }, // vpermw		{ TTI::SK_Reverse, MVT::v32i16, 1 }, // vpermw
{ TTI::SK_Reverse, MVT::v16i16, 1 }, // vpermw		{ TTI::SK_Reverse, MVT::v16i16, 1 }, // vpermw
{ TTI::SK_Reverse, MVT::v64i8, 6 } // vextracti64x4 + 2*vperm2i128		{ TTI::SK_Reverse, MVT::v64i8, 6 } // vextracti64x4 + 2*vperm2i128
// + 2*pshufb + vinserti64x4		// + 2*pshufb + vinserti64x4
};		};

if (ST->hasBWI())		if (ST->hasBWI())
if (const auto *Entry =		if (const auto *Entry =
CostTableLookup(AVX512BWShuffleTbl, Kind, LT.second))		CostTableLookup(AVX512BWShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry AVX512ShuffleTbl[] = {		static const CostTblEntry AVX512ShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v8f64, 1 }, // vbroadcastpd
		andreadbUnsubmitted Not Done Reply Inline Actions Same (also, see lines 654, 664, 677 and 686). andreadb: Same (also, see lines 654, 664, 677 and 686).
		{ TTI::SK_Broadcast, MVT::v16f32, 1 }, // vbroadcastps
		{ TTI::SK_Broadcast, MVT::v8i64, 1 }, // vpbroadcastq
		{ TTI::SK_Broadcast, MVT::v16i32, 1 }, // vpbroadcastd

{ TTI::SK_Reverse, MVT::v8f64, 1 }, // vpermpd		{ TTI::SK_Reverse, MVT::v8f64, 1 }, // vpermpd
{ TTI::SK_Reverse, MVT::v16f32, 1 }, // vpermps		{ TTI::SK_Reverse, MVT::v16f32, 1 }, // vpermps
{ TTI::SK_Reverse, MVT::v8i64, 1 }, // vpermq		{ TTI::SK_Reverse, MVT::v8i64, 1 }, // vpermq
{ TTI::SK_Reverse, MVT::v16i32, 1 }, // vpermd		{ TTI::SK_Reverse, MVT::v16i32, 1 } // vpermd
};		};

if (ST->hasAVX512())		if (ST->hasAVX512())
if (const auto *Entry =		if (const auto *Entry =
CostTableLookup(AVX512ShuffleTbl, Kind, LT.second))		CostTableLookup(AVX512ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry AVX2ShuffleTbl[] = {		static const CostTblEntry AVX2ShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v4f64, 1 }, // vbroadcastpd
		{ TTI::SK_Broadcast, MVT::v8f32, 1 }, // vbroadcastps
		{ TTI::SK_Broadcast, MVT::v4i64, 1 }, // vpbroadcastq
		{ TTI::SK_Broadcast, MVT::v8i32, 1 }, // vpbroadcastd
		{ TTI::SK_Broadcast, MVT::v16i16, 1 }, // vpbroadcastw
		{ TTI::SK_Broadcast, MVT::v32i8, 1 }, // vpbroadcastb

{ TTI::SK_Reverse, MVT::v4f64, 1 }, // vpermpd		{ TTI::SK_Reverse, MVT::v4f64, 1 }, // vpermpd
{ TTI::SK_Reverse, MVT::v8f32, 1 }, // vpermps		{ TTI::SK_Reverse, MVT::v8f32, 1 }, // vpermps
{ TTI::SK_Reverse, MVT::v4i64, 1 }, // vpermq		{ TTI::SK_Reverse, MVT::v4i64, 1 }, // vpermq
{ TTI::SK_Reverse, MVT::v8i32, 1 }, // vpermd		{ TTI::SK_Reverse, MVT::v8i32, 1 }, // vpermd
{ TTI::SK_Reverse, MVT::v16i16, 2 }, // vperm2i128 + pshufb		{ TTI::SK_Reverse, MVT::v16i16, 2 }, // vperm2i128 + pshufb
{ TTI::SK_Reverse, MVT::v32i8, 2 }, // vperm2i128 + pshufb		{ TTI::SK_Reverse, MVT::v32i8, 2 }, // vperm2i128 + pshufb

{ TTI::SK_Alternate, MVT::v16i16, 1 }, // vpblendw		{ TTI::SK_Alternate, MVT::v16i16, 1 }, // vpblendw
{ TTI::SK_Alternate, MVT::v32i8, 1 } // vpblendvb		{ TTI::SK_Alternate, MVT::v32i8, 1 } // vpblendvb
};		};

if (ST->hasAVX2())		if (ST->hasAVX2())
if (const auto *Entry = CostTableLookup(AVX2ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(AVX2ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry AVX1ShuffleTbl[] = {		static const CostTblEntry AVX1ShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v4f64, 2 }, // vperm2f128 + vpermilpd
		{ TTI::SK_Broadcast, MVT::v8f32, 2 }, // vperm2f128 + vpermilps
		{ TTI::SK_Broadcast, MVT::v4i64, 2 }, // vperm2f128 + vpermilpd
		{ TTI::SK_Broadcast, MVT::v8i32, 2 }, // vperm2f128 + vpermilps
		{ TTI::SK_Broadcast, MVT::v16i16, 3 }, // vpshuflw + vpshufd + vinsertf128
		delenaUnsubmitted Not Done Reply Inline Actions theoretically, v16i16 should not be more expensive than v32i8, vpshufb should work for v8i16 as it works for v16i8. (I don't know how it is implemented). delena: theoretically, v16i16 should not be more expensive than v32i8, vpshufb should work for v8i16 as…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions It does currently have the vpshuflw + vpshufd + vinsertf128 pattern - I'll look improving this in shuffle combining. RKSimon: It does currently have the vpshuflw + vpshufd + vinsertf128 pattern - I'll look improving this…
		{ TTI::SK_Broadcast, MVT::v32i8, 2 }, // vpshufb + vinsertf128

		andreadbUnsubmitted Not Done Reply Inline Actions Can we not just simplify this code into something like this? if (ST->hasSSE1() && LT.second == MVT::v4f32) return LT.first; You are basically performing a lookup on a table with just a single entry. andreadb: Can we not just simplify this code into something like this? ``` if (ST->hasSSE1() && LT.
{ TTI::SK_Reverse, MVT::v4f64, 2 }, // vperm2f128 + vpermilpd		{ TTI::SK_Reverse, MVT::v4f64, 2 }, // vperm2f128 + vpermilpd
{ TTI::SK_Reverse, MVT::v8f32, 2 }, // vperm2f128 + vpermilps		{ TTI::SK_Reverse, MVT::v8f32, 2 }, // vperm2f128 + vpermilps
{ TTI::SK_Reverse, MVT::v4i64, 2 }, // vperm2f128 + vpermilpd		{ TTI::SK_Reverse, MVT::v4i64, 2 }, // vperm2f128 + vpermilpd
{ TTI::SK_Reverse, MVT::v8i32, 2 }, // vperm2f128 + vpermilps		{ TTI::SK_Reverse, MVT::v8i32, 2 }, // vperm2f128 + vpermilps
{ TTI::SK_Reverse, MVT::v16i16, 4 }, // vextractf128 + 2*pshufb		{ TTI::SK_Reverse, MVT::v16i16, 4 }, // vextractf128 + 2*pshufb
// + vinsertf128		// + vinsertf128
{ TTI::SK_Reverse, MVT::v32i8, 4 }, // vextractf128 + 2*pshufb		{ TTI::SK_Reverse, MVT::v32i8, 4 }, // vextractf128 + 2*pshufb
// + vinsertf128		// + vinsertf128
Show All 19 Lines	static const CostTblEntry SSE41ShuffleTbl[] = {
{ TTI::SK_Alternate, MVT::v16i8, 1 } // pblendvb		{ TTI::SK_Alternate, MVT::v16i8, 1 } // pblendvb
};		};

if (ST->hasSSE41())		if (ST->hasSSE41())
if (const auto *Entry = CostTableLookup(SSE41ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(SSE41ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry SSSE3ShuffleTbl[] = {		static const CostTblEntry SSSE3ShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v8i16, 1 }, // pshufb
		{ TTI::SK_Broadcast, MVT::v16i8, 1 }, // pshufb

{ TTI::SK_Reverse, MVT::v8i16, 1 }, // pshufb		{ TTI::SK_Reverse, MVT::v8i16, 1 }, // pshufb
{ TTI::SK_Reverse, MVT::v16i8, 1 }, // pshufb		{ TTI::SK_Reverse, MVT::v16i8, 1 }, // pshufb

{ TTI::SK_Alternate, MVT::v8i16, 3 }, // pshufb + pshufb + por		{ TTI::SK_Alternate, MVT::v8i16, 3 }, // pshufb + pshufb + por
{ TTI::SK_Alternate, MVT::v16i8, 3 } // pshufb + pshufb + por		{ TTI::SK_Alternate, MVT::v16i8, 3 } // pshufb + pshufb + por
};		};

if (ST->hasSSSE3())		if (ST->hasSSSE3())
if (const auto *Entry = CostTableLookup(SSSE3ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(SSSE3ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry SSE2ShuffleTbl[] = {		static const CostTblEntry SSE2ShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v2f64, 1 }, // shufpd
		{ TTI::SK_Broadcast, MVT::v2i64, 1 }, // pshufd
		{ TTI::SK_Broadcast, MVT::v4i32, 1 }, // pshufd
		{ TTI::SK_Broadcast, MVT::v8i16, 2 }, // pshuflw + pshufd
		{ TTI::SK_Broadcast, MVT::v16i8, 3 }, // unpck + pshuflw + pshufd

{ TTI::SK_Reverse, MVT::v2f64, 1 }, // shufpd		{ TTI::SK_Reverse, MVT::v2f64, 1 }, // shufpd
{ TTI::SK_Reverse, MVT::v2i64, 1 }, // pshufd		{ TTI::SK_Reverse, MVT::v2i64, 1 }, // pshufd
{ TTI::SK_Reverse, MVT::v4i32, 1 }, // pshufd		{ TTI::SK_Reverse, MVT::v4i32, 1 }, // pshufd
{ TTI::SK_Reverse, MVT::v8i16, 3 }, // pshuflw + pshufhw + pshufd		{ TTI::SK_Reverse, MVT::v8i16, 3 }, // pshuflw + pshufhw + pshufd
{ TTI::SK_Reverse, MVT::v16i8, 9 }, // 2pshuflw + 2pshufhw		{ TTI::SK_Reverse, MVT::v16i8, 9 }, // 2pshuflw + 2pshufhw
// + 2pshufd + 2unpck + packus		// + 2pshufd + 2unpck + packus

{ TTI::SK_Alternate, MVT::v2i64, 1 }, // movsd		{ TTI::SK_Alternate, MVT::v2i64, 1 }, // movsd
{ TTI::SK_Alternate, MVT::v2f64, 1 }, // movsd		{ TTI::SK_Alternate, MVT::v2f64, 1 }, // movsd
{ TTI::SK_Alternate, MVT::v4i32, 2 }, // 2*shufps		{ TTI::SK_Alternate, MVT::v4i32, 2 }, // 2*shufps
{ TTI::SK_Alternate, MVT::v8i16, 3 }, // pand + pandn + por		{ TTI::SK_Alternate, MVT::v8i16, 3 }, // pand + pandn + por
{ TTI::SK_Alternate, MVT::v16i8, 3 } // pand + pandn + por		{ TTI::SK_Alternate, MVT::v16i8, 3 } // pand + pandn + por
};		};

if (ST->hasSSE2())		if (ST->hasSSE2())
if (const auto *Entry = CostTableLookup(SSE2ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(SSE2ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

static const CostTblEntry SSE1ShuffleTbl[] = {		static const CostTblEntry SSE1ShuffleTbl[] = {
		{ TTI::SK_Broadcast, MVT::v4f32, 1 }, // shufps
{ TTI::SK_Reverse, MVT::v4f32, 1 }, // shufps		{ TTI::SK_Reverse, MVT::v4f32, 1 }, // shufps
{ TTI::SK_Alternate, MVT::v4f32, 2 } // 2*shufps		{ TTI::SK_Alternate, MVT::v4f32, 2 } // 2*shufps
};		};

if (ST->hasSSE1())		if (ST->hasSSE1())
if (const auto *Entry = CostTableLookup(SSE1ShuffleTbl, Kind, LT.second))		if (const auto *Entry = CostTableLookup(SSE1ShuffleTbl, Kind, LT.second))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

▲ Show 20 Lines • Show All 1,467 Lines • Show Last 20 Lines

test/Analysis/CostModel/X86/shuffle-broadcast.ll

	Show All 12 Lines
	; CHECK-LABEL: 'test_vXf64'			; CHECK-LABEL: 'test_vXf64'
	define void @test_vXf64(<2 x double> %src128, <4 x double> %src256, <8 x double> %src512) {			define void @test_vXf64(<2 x double> %src128, <4 x double> %src256, <8 x double> %src512) {
	; SSE: cost of 1 {{.*}} %V128 = shufflevector			; SSE: cost of 1 {{.*}} %V128 = shufflevector
	; AVX: cost of 1 {{.*}} %V128 = shufflevector			; AVX: cost of 1 {{.*}} %V128 = shufflevector
	; AVX512: cost of 1 {{.*}} %V128 = shufflevector			; AVX512: cost of 1 {{.*}} %V128 = shufflevector
	%V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> zeroinitializer			%V128 = shufflevector <2 x double> %src128, <2 x double> undef, <2 x i32> zeroinitializer

	; SSE: cost of 1 {{.*}} %V256 = shufflevector			; SSE: cost of 1 {{.*}} %V256 = shufflevector
	; AVX: cost of 1 {{.*}} %V256 = shufflevector			; AVX1: cost of 2 {{.*}} %V256 = shufflevector
				; AVX2: cost of 1 {{.*}} %V256 = shufflevector
	; AVX512: cost of 1 {{.*}} %V256 = shufflevector			; AVX512: cost of 1 {{.*}} %V256 = shufflevector
	%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> zeroinitializer			%V256 = shufflevector <4 x double> %src256, <4 x double> undef, <4 x i32> zeroinitializer

	; SSE: cost of 1 {{.*}} %V512 = shufflevector			; SSE: cost of 1 {{.*}} %V512 = shufflevector
	; AVX: cost of 1 {{.*}} %V512 = shufflevector			; AVX1: cost of 2 {{.*}} %V512 = shufflevector
				; AVX2: cost of 1 {{.*}} %V512 = shufflevector
	; AVX512: cost of 1 {{.*}} %V512 = shufflevector			; AVX512: cost of 1 {{.*}} %V512 = shufflevector
	%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> zeroinitializer			%V512 = shufflevector <8 x double> %src512, <8 x double> undef, <8 x i32> zeroinitializer

	ret void			ret void
	}			}

				; CHECK-LABEL: 'test_vXi64'
				define void @test_vXi64(<2 x i64> %src128, <4 x i64> %src256, <8 x i64> %src512) {
				; SSE: cost of 1 {{.*}} %V128 = shufflevector
				; AVX: cost of 1 {{.*}} %V128 = shufflevector
				; AVX512: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <2 x i64> %src128, <2 x i64> undef, <2 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V256 = shufflevector
				; AVX1: cost of 2 {{.*}} %V256 = shufflevector
				; AVX2: cost of 1 {{.*}} %V256 = shufflevector
				; AVX512: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <4 x i64> %src256, <4 x i64> undef, <4 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V512 = shufflevector
				; AVX1: cost of 2 {{.*}} %V512 = shufflevector
				; AVX2: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <8 x i64> %src512, <8 x i64> undef, <8 x i32> zeroinitializer

				ret void
				}

				; CHECK-LABEL: 'test_vXf32'
				define void @test_vXf32(<2 x float> %src64, <4 x float> %src128, <8 x float> %src256, <16 x float> %src512) {
				; SSE: cost of 1 {{.*}} %V64 = shufflevector
				; AVX: cost of 1 {{.*}} %V64 = shufflevector
				; AVX512: cost of 1 {{.*}} %V64 = shufflevector
				%V64 = shufflevector <2 x float> %src64, <2 x float> undef, <2 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V128 = shufflevector
				; AVX: cost of 1 {{.*}} %V128 = shufflevector
				; AVX512: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <4 x float> %src128, <4 x float> undef, <4 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V256 = shufflevector
				; AVX1: cost of 2 {{.*}} %V256 = shufflevector
				; AVX2: cost of 1 {{.*}} %V256 = shufflevector
				; AVX512: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <8 x float> %src256, <8 x float> undef, <8 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V512 = shufflevector
				; AVX1: cost of 2 {{.*}} %V512 = shufflevector
				; AVX2: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <16 x float> %src512, <16 x float> undef, <16 x i32> zeroinitializer

				ret void
				}

				; CHECK-LABEL: 'test_vXi32'
				define void @test_vXi32(<2 x i32> %src64, <4 x i32> %src128, <8 x i32> %src256, <16 x i32> %src512) {
				; SSE: cost of 1 {{.*}} %V64 = shufflevector
				; AVX: cost of 1 {{.*}} %V64 = shufflevector
				; AVX512: cost of 1 {{.*}} %V64 = shufflevector
				%V64 = shufflevector <2 x i32> %src64, <2 x i32> undef, <2 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V128 = shufflevector
				; AVX: cost of 1 {{.*}} %V128 = shufflevector
				; AVX512: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <4 x i32> %src128, <4 x i32> undef, <4 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V256 = shufflevector
				; AVX1: cost of 2 {{.*}} %V256 = shufflevector
				; AVX2: cost of 1 {{.*}} %V256 = shufflevector
				; AVX512: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <8 x i32> %src256, <8 x i32> undef, <8 x i32> zeroinitializer

				; SSE: cost of 1 {{.*}} %V512 = shufflevector
				; AVX1: cost of 2 {{.*}} %V512 = shufflevector
				; AVX2: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <16 x i32> %src512, <16 x i32> undef, <16 x i32> zeroinitializer

				ret void
				}

				; CHECK-LABEL: 'test_vXi16'
				define void @test_vXi16(<8 x i16> %src128, <16 x i16> %src256, <32 x i16> %src512) {
				; SSE2: cost of 2 {{.*}} %V128 = shufflevector
				; SSSE3: cost of 1 {{.*}} %V128 = shufflevector
				; SSE42: cost of 1 {{.*}} %V128 = shufflevector
				; AVX: cost of 1 {{.*}} %V128 = shufflevector
				; AVX512: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <8 x i16> %src128, <8 x i16> undef, <8 x i32> zeroinitializer

				; SSE2: cost of 2 {{.*}} %V256 = shufflevector
				; SSSE3: cost of 1 {{.*}} %V256 = shufflevector
				; SSE42: cost of 1 {{.*}} %V256 = shufflevector
				; AVX1: cost of 3 {{.*}} %V256 = shufflevector
				; AVX2: cost of 1 {{.*}} %V256 = shufflevector
				; AVX512: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <16 x i16> %src256, <16 x i16> undef, <16 x i32> zeroinitializer

				; SSE2: cost of 2 {{.*}} %V512 = shufflevector
				; SSSE3: cost of 1 {{.*}} %V512 = shufflevector
				; SSE42: cost of 1 {{.*}} %V512 = shufflevector
				; AVX1: cost of 3 {{.*}} %V512 = shufflevector
				; AVX2: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512F: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512BW: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <32 x i16> %src512, <32 x i16> undef, <32 x i32> zeroinitializer

				ret void
				}

				; CHECK-LABEL: 'test_vXi8'
				define void @test_vXi8(<16 x i8> %src128, <32 x i8> %src256, <64 x i8> %src512) {
				; SSE2: cost of 3 {{.*}} %V128 = shufflevector
				; SSSE3: cost of 1 {{.*}} %V128 = shufflevector
				; SSE42: cost of 1 {{.*}} %V128 = shufflevector
				; AVX: cost of 1 {{.*}} %V128 = shufflevector
				; AVX512: cost of 1 {{.*}} %V128 = shufflevector
				%V128 = shufflevector <16 x i8> %src128, <16 x i8> undef, <16 x i32> zeroinitializer

				; SSE2: cost of 3 {{.*}} %V256 = shufflevector
				; SSSE3: cost of 1 {{.*}} %V256 = shufflevector
				; SSE42: cost of 1 {{.*}} %V256 = shufflevector
				; AVX1: cost of 2 {{.*}} %V256 = shufflevector
				; AVX2: cost of 1 {{.*}} %V256 = shufflevector
				; AVX512: cost of 1 {{.*}} %V256 = shufflevector
				%V256 = shufflevector <32 x i8> %src256, <32 x i8> undef, <32 x i32> zeroinitializer

				; SSE2: cost of 3 {{.*}} %V512 = shufflevector
				; SSSE3: cost of 1 {{.*}} %V512 = shufflevector
				; SSE42: cost of 1 {{.*}} %V512 = shufflevector
				; AVX1: cost of 2 {{.*}} %V512 = shufflevector
				; AVX2: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512F: cost of 1 {{.*}} %V512 = shufflevector
				; AVX512BW: cost of 1 {{.*}} %V512 = shufflevector
				%V512 = shufflevector <64 x i8> %src512, <64 x i8> undef, <64 x i32> zeroinitializer

				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][X86] Add support for broadcast shuffle costs
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83232

lib/Target/X86/X86TargetTransformInfo.cpp

test/Analysis/CostModel/X86/shuffle-broadcast.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CostModel][X86] Add support for broadcast shuffle costsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 83232

lib/Target/X86/X86TargetTransformInfo.cpp

test/Analysis/CostModel/X86/shuffle-broadcast.ll

[CostModel][X86] Add support for broadcast shuffle costs
ClosedPublic