This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
7/7
X86TargetTransformInfo.cpp
-
test/Analysis/CostModel/X86/
-
Analysis/
-
CostModel/
-
X86/
2/3
load_store.ll

Differential D100099

[X86][CostModel] Try to fix cost computation load/stores of non-power-of-two vectors
ClosedPublic

Authored by lebedev.ri on Apr 8 2021, 5:10 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
ABataev
spatel

Summary

Sometimes LV has to produce really wide vectors,
and sometimes they end up being not powers of two.
As it can be seen from the diff, the cost computation
is currently completely non-sensical in those cases.

I don't really know what i'm doing, but does this look better?

Instead of just scalarizing everything, split/factorize the wide vector
into a number of subvectors, each one having a power-of-two elements,
recurse to get the cost of op on this subvector. Also, check how we'd
legalize this subvector, and if the legalized type is scalar,
also account for the scalarization cost.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Apr 8 2021, 5:10 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptApr 8 2021, 5:10 AM

lebedev.ri requested review of this revision.Apr 8 2021, 5:10 AM

ABataev added inline comments.Apr 8 2021, 5:33 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3222–3230	Why not just something like this: unsigned Factor = 0; for (; NumElem > 0; NumElem -= Factor) { Factor = PowerOf2Floor(NumElem); ..... }

(yes, this doesn't help LV itself, still looking into that)

Harbormaster completed remote builds in B97698: Diff 336063.Apr 8 2021, 6:05 AM

Simplify loop as proposed by @ABataev.
I guess there is a similar problem in X86TTIImpl::getMaskedMemoryOpCost().

I've figured out why this doesn't affect LV - because i should be fixing X86TTIImpl::getInterleavedMemoryOpCost().
Will look there too..

Harbormaster completed remote builds in B97772: Diff 336157.Apr 8 2021, 11:36 AM

ping @RKSimon - does this make general sense?
This matches the codegen at least: https://godbolt.org/z/TPrMKdnoa https://godbolt.org/z/d98sdMr3q https://godbolt.org/z/EcaKobEaW

RKSimon added inline comments.Apr 15 2021, 5:32 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3236	APInt::getBitsSet ?

@RKSimon thank you for taking a look!

Rebased, addressed nit.

Cheers @lebedev.ri, the premise seems fine, and the costs are a lot more sensible

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3222–3230	+1 Having Factor updated in the condition as well as being used in increment block is difficult to grok

lebedev.ri added inline comments.Apr 15 2021, 6:24 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3222–3230	Note that i have addressed @ABataev's comment, it was about earlier patch version: https://reviews.llvm.org/D100099?id=336063#change-OQEVJvBxQbDZ

Harbormaster completed remote builds in B98892: Diff 337728.Apr 15 2021, 6:57 AM

This is probably still impresice for small remainder sub-vectors.
E.g. load cost for <3 x float> w/ 8 byte alignment should be 1: https://godbolt.org/z/r3ncvMvaf

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3222–3230	... or are you telling to move `Factor` computation from the condition?
3234	Hm, i wonder if we also need to add `getShuffleCost(SK_ExtractSubvector` cost. (with wide vector ty widened to next power of two)

In D100099#2691945, @lebedev.ri wrote:

This is probably still impresice for small remainder sub-vectors.
E.g. load cost for <3 x float> w/ 8 byte alignment should be 1: https://godbolt.org/z/r3ncvMvaf

That assumes 16-byte alignment though - for v3f32 the more likely (element) 4-byte alignment will be worse - not sure whether we have much difference alignment coverage (or whether its worth it).

The costs are a lot better than what they were, but there are a few cases that seem off - if I had to guess I'd say we're not completely getting the split types matching the legalized types?

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3222–3230	If you can, but it looks like making the dependencies between NumElemLeft and Factor simpler won't be easy - I'm happy for you to keep it as.
llvm/test/Analysis/CostModel/X86/load_store.ll
26–39	cost = 2 ? SSE max size is 128-bit vector - so no subvector extraction cost - <3 x double> seems to get it right?

In D100099#2692024, @RKSimon wrote:

In D100099#2691945, @lebedev.ri wrote:

This is probably still impresice for small remainder sub-vectors.
E.g. load cost for <3 x float> w/ 8 byte alignment should be 1: https://godbolt.org/z/r3ncvMvaf

That assumes 16-byte alignment though - for v3f32 the more likely (element) 4-byte alignment will be worse - not sure whether we have much difference alignment coverage (or whether its worth it).

Sorry, i meant 16, yes.

The costs are a lot better than what they were, but there are a few cases that seem off - if I had to guess I'd say we're not completely getting the split types matching the legalized types?

llvm/test/Analysis/CostModel/X86/load_store.ll
26–39	This happens because for vectors we assume that extracting 0'th element of a FP vector is free: https://github.com/llvm/llvm-project/blob/4f42d873c20291077f5a1ed37b102330d505f00d/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3055-L3065

RKSimon mentioned this in rG2a1a2f5733b0: [CostModel][X86] Add fully aligned load/store tests.Apr 16 2021, 2:39 AM

I've added some 64-byte aligned tests which we can probably add special case load costs - please can you rebase?

For the remaining edge cases - would you prefer to fix them here or iterate in follow up patches? I'd be happy to accept this as is (after rebase) - its a massive improvement and makes it much easier to tweak as we go on.

llvm/test/Analysis/CostModel/X86/load_store.ll
26–39	Ah - of course - we're hitting the issue that the extract + store costs are treated separately, but for f32/f64 extract_0 is free.

@RKSimon thank you for talking a look!
Rebased.
Since the next changes would reduce the cost, not increase it, i think they should go into next patch.

Harbormaster completed remote builds in B99137: Diff 338060.Apr 16 2021, 5:20 AM

LGTM

This revision is now accepted and ready to land.Apr 16 2021, 5:22 AM

In D100099#2694440, @RKSimon wrote:

LGTM

Thank you for the review.
I'll look into improving subvector load costs.

This was committed in b06c55a6986e0e1d571663eec507664013b22f00 with proper Differential Revision: , i'm not sure why phab didn't pick it up.

srj added a subscriber: srj.Apr 20 2021, 1:31 PM

This commit appears to have injected a hang (or > 60s delay) in certain Halide tests when using the JIT (the "hang" being inside LLVM code, but JIT-generated code). I'm working on finding a small repro case.

In D100099#2702921, @srj wrote:

This commit appears to have injected a hang (or > 60s delay) in certain Halide tests when using the JIT (the "hang" being inside LLVM code, but JIT-generated code). I'm working on finding a small repro case.

Hm, interesting. Could you please also check if that still reproduces with D100684 (awaiting review) ?

In D100099#2702949, @lebedev.ri wrote:

Hm, interesting. Could you please also check if that still reproduces with D100684 (awaiting review) ?

Testing now

In D100099#2703023, @srj wrote:

In D100099#2702949, @lebedev.ri wrote:

Hm, interesting. Could you please also check if that still reproduces with D100684 (awaiting review) ?

Testing now

Sadly, no, still hangs indefinitely with that patched in

In D100099#2703026, @srj wrote:

In D100099#2703023, @srj wrote:

In D100099#2702949, @lebedev.ri wrote:

Hm, interesting. Could you please also check if that still reproduces with D100684 (awaiting review) ?

Testing now

Sadly, no, still hangs indefinitely with that patched in

Thanks for checking. Awaiting the reproducer.

OK, here's a file that I think will repro it:

bad.ll37 KBDownload

To see the hang, do ~/llvm-13-install/bin/llc -mcpu=penryn -o - -O3 bad.ll (Note that the cpu / microarchtecture matters here; the hang for us was only when specializing for SSE4.1 SIMD)

In D100099#2703043, @srj wrote:

OK, here's a file that I think will repro it:
bad.ll37 KBDownload

To see the hang, do ~/llvm-13-install/bin/llc -mcpu=penryn -o - -O3 bad.ll (Note that the cpu / microarchtecture matters here; the hang for us was only when specializing for SSE4.1 SIMD)

Hm, that's because this isn't the patch you're looking for <jedi hand-wave>.
Please rebisect, reverting this doesn't make that hang go away for me.

In D100099#2703102, @lebedev.ri wrote:

Please rebisect, reverting this doesn't make that hang go away for me.

Huh, weird. I'll recheck.

In D100099#2703102, @lebedev.ri wrote:

Please rebisect, reverting this doesn't make that hang go away for me.

When reverting, how did you resolve the (many) conflicts in load_store.ll? I'm not familiar with that code and am unsure of the correct resolution.

Update:

rerunning bisect still points at this commit.
it looks like the reproducer I uploaded does indeed hang llc quite a ways back -- even llc from an LLVM12 build hangs in the same way.
running llc at -O0 succeeds, but -O1 or higher still fails.
Looks like we're triggering exponential runtime somewhere under TargetLowering::SimplifyDemandedBits.

Given this behavior, perhaps this is a pre-existing bug in the optimizer, which was never triggered before until this change "unmasked" it?

I'm trying to figure out if there's a way to get you a repro case without requiring you to build Halide locally (since the hang occurs when Halide uses MCJIT).

If you don't mind pulling and building Halide locally, it is easy to repro; here are steps in case you want to try (assumes a linux env):

git clone https://github.com/halide/Halide
cd Halide
git checkout srj/hang-repro # This is a branch I made to simplify the repro
export LLVM_CONFIG=/path/to/llvm/install/dir
export HL_JIT_TARGET=x86-64-linux-sse41
make -j$(nproc) correctness_vector_reductions

Note that this target runs with a 60-second timeout (which is generous, as it should normally be well under a second).

FYI, I've opened a bug (https://bugs.llvm.org/show_bug.cgi?id=50049) for the llc hang, as it seems like a clear problem regardless of whether this commit is involved.

If it hangs all the way back, then the bisection *can not* point to this revision, and this could not have triggered/exposed this.

Since we can reproduce the hang with only llc, we know that this patch is not the real source of the bug. I'm reducing a test based on PR50049.

In D100099#2704217, @lebedev.ri wrote:

If it hangs all the way back, then the bisection *can not* point to this revision, and this could not have triggered/exposed this.

Using llc on the .ll file I uploaded does indeed hang quite a ways back.

Using Halide to drive MCJIT , however, is not the same code path, and while I can't explain it, that path *does* trigger a hang starting at this revision.

As mentioned above, it doesn't seem implausible that the changes made here happened to uncover a pre-existing bug that we didn't know about before.

Commit a511b55cfd67acecc58f1ccf1f3ce5c917dc1d90 fixes both the llc hang and MCJIT-specific hang. Thanks for the attention.

lebedev.ri mentioned this in D102990: [X86][Costmodel] getMaskedMemoryOpCost(): don't scalarize non-power-of-two vectors with legal element type.May 23 2021, 12:00 PM

lebedev.ri mentioned this in rGc666208f6380: [X86][Costmodel] getMaskedMemoryOpCost(): don't scalarize non-power-of-two….May 24 2021, 10:10 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

66 lines

test/

Analysis/

CostModel/

X86/

load_store.ll

192 lines

Diff 336157

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 3,201 Lines • ▼ Show 20 Lines	if (auto *SI = dyn_cast_or_null<StoreInst>(I)) {
if (auto *GEP = dyn_cast<GetElementPtrInst>(SI->getPointerOperand())) {		if (auto *GEP = dyn_cast<GetElementPtrInst>(SI->getPointerOperand())) {
if (!all_of(GEP->indices(), [](Value *V) { return isa<Constant>(V); }))		if (!all_of(GEP->indices(), [](Value *V) { return isa<Constant>(V); }))
return TTI::TCC_Basic * 2;		return TTI::TCC_Basic * 2;
}		}
}		}
return TTI::TCC_Basic;		return TTI::TCC_Basic;
}		}

// Handle non-power-of-two vectors such as <3 x float>		assert((Opcode == Instruction::Load \|\| Opcode == Instruction::Store) &&
		"Invalid Opcode");
		// Type legalization can't handle structs
		if (TLI->getValueType(DL, Src, true) == MVT::Other)
		return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
		CostKind);

		// Handle non-power-of-two vectors such as <3 x float> and <48 x i16>
if (auto *VTy = dyn_cast<FixedVectorType>(Src)) {		if (auto *VTy = dyn_cast<FixedVectorType>(Src)) {
unsigned NumElem = VTy->getNumElements();		const unsigned NumElem = VTy->getNumElements();
		if (!isPowerOf2_32(NumElem)) {
		// Factorize NumElem into sum of power-of-two.
		int Cost = 0;
		unsigned NumElemDone = 0;
		for (unsigned NumElemLeft = NumElem, Factor;
		Factor = PowerOf2Floor(NumElemLeft), NumElemLeft > 0;
		NumElemLeft -= Factor) {
		Type *SubTy = FixedVectorType::get(VTy->getScalarType(), Factor);
		unsigned SubTyBytes = SubTy->getPrimitiveSizeInBits() / 8;

// Handle a few common cases:		Cost +=
		ABataevUnsubmitted Done Reply Inline Actions Why not just something like this: unsigned Factor = 0; for (; NumElem > 0; NumElem -= Factor) { Factor = PowerOf2Floor(NumElem); ..... } ABataev: Why not just something like this: ``` unsigned Factor = 0; for (; NumElem > 0; NumElem -=…
		RKSimonUnsubmitted Done Reply Inline Actions +1 Having Factor updated in the condition as well as being used in increment block is difficult to grok RKSimon: +1 Having Factor updated in the condition as well as being used in increment block is difficult…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Note that i have addressed @ABataev's comment, it was about earlier patch version: https://reviews.llvm.org/D100099?id=336063#change-OQEVJvBxQbDZ lebedev.ri: Note that i have addressed @ABataev's comment, it was about earlier patch version: https…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions ... or are you telling to move `Factor` computation from the condition? lebedev.ri: ... or are you telling to move `Factor` computation from the condition?
		RKSimonUnsubmitted Done Reply Inline Actions If you can, but it looks like making the dependencies between NumElemLeft and Factor simpler won't be easy - I'm happy for you to keep it as. RKSimon: If you can, but it looks like making the dependencies between NumElemLeft and Factor simpler…
// <3 x float>		getMemoryOpCost(Opcode, SubTy, Alignment, AddressSpace, CostKind);
if (NumElem == 3 && VTy->getScalarSizeInBits() == 32)
// Cost = 64 bit store + extract + 32 bit store.
return 3;

// <3 x double>
if (NumElem == 3 && VTy->getScalarSizeInBits() == 64)
// Cost = 128 bit store + unpack + 64 bit store.
return 3;

// Assume that all other non-power-of-two numbers are scalarized.		std::pair<int, MVT> LST = TLI->getTypeLegalizationCost(DL, SubTy);
if (!isPowerOf2_32(NumElem)) {		if (!LST.second.isVector()) {
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Hm, i wonder if we also need to add `getShuffleCost(SK_ExtractSubvector` cost. (with wide vector ty widened to next power of two) lebedev.ri: Hm, i wonder if we also need to add `getShuffleCost(SK_ExtractSubvector` cost. (with wide…
APInt DemandedElts = APInt::getAllOnesValue(NumElem);		APInt DemandedElts = APInt::getNullValue(NumElem);
int Cost = BaseT::getMemoryOpCost(Opcode, VTy->getScalarType(), Alignment,		DemandedElts.setBits(NumElemDone, NumElemDone + Factor);
		RKSimonUnsubmitted Done Reply Inline Actions APInt::getBitsSet ? RKSimon: APInt::getBitsSet ?
AddressSpace, CostKind);		Cost += getScalarizationOverhead(VTy, DemandedElts,
int SplitCost = getScalarizationOverhead(VTy, DemandedElts,
Opcode == Instruction::Load,		Opcode == Instruction::Load,
Opcode == Instruction::Store);		Opcode == Instruction::Store);
return NumElem * Cost + SplitCost;
}
}		}

// Type legalization can't handle structs		NumElemDone += Factor;
if (TLI->getValueType(DL, Src, true) == MVT::Other)		Alignment = commonAlignment(Alignment.valueOrOne(), SubTyBytes);
return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		}
CostKind);		assert(NumElemDone == NumElem && "Processed wrong element count?");
		return Cost;
		}
		}

// Legalize the type.		// Legalize the type.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Src);		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Src);
assert((Opcode == Instruction::Load \|\| Opcode == Instruction::Store) &&
"Invalid Opcode");

// Each load/store unit costs 1.		// Each load/store unit costs 1.
int Cost = LT.first * 1;		int Cost = LT.first * 1;

// This isn't exactly right. We're using slow unaligned 32-byte accesses as a		// This isn't exactly right. We're using slow unaligned 32-byte accesses as a
// proxy for a double-pumped AVX memory interface such as on Sandybridge.		// proxy for a double-pumped AVX memory interface such as on Sandybridge.
if (LT.second.getStoreSize() == 32 && ST->isUnalignedMem32Slow())		if (LT.second.getStoreSize() == 32 && ST->isUnalignedMem32Slow())
Cost *= 2;		Cost *= 2;
▲ Show 20 Lines • Show All 1,526 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/load_store.ll

	Show All 14 Lines
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store i128 undef, i128* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store i128 undef, i128* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> undef, <4 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> undef, <4 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i64> undef, <4 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <4 x i64> undef, <4 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i32> undef, <8 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i32> undef, <8 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <8 x i64> undef, <8 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <8 x i64> undef, <8 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x float> undef, <3 x float>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x float> undef, <3 x float>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x double> undef, <3 x double>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <3 x double> undef, <3 x double>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i32> undef, <3 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <3 x i32> undef, <3 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 14 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 22 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 24 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 26 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 46 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 48 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 50 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 94 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 9 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 96 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 98 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4
				RKSimonUnsubmitted Done Reply Inline Actions cost = 2 ? SSE max size is 128-bit vector - so no subvector extraction cost - <3 x double> seems to get it right? RKSimon: cost = 2 ? SSE max size is 128-bit vector - so no subvector extraction cost - <3 x double>…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions This happens because for vectors we assume that extracting 0'th element of a FP vector is free: https://github.com/llvm/llvm-project/blob/4f42d873c20291077f5a1ed37b102330d505f00d/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3055-L3065 lebedev.ri: This happens because for vectors we assume that extracting 0'th element of a FP vector is free…
				RKSimonUnsubmitted Not Done Reply Inline Actions Ah - of course - we're hitting the issue that the extract + store costs are treated separately, but for f32/f64 extract_0 is free. RKSimon: Ah - of course - we're hitting the issue that the extract + store costs are treated separately…
	; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX-LABEL: 'stores'			; AVX-LABEL: 'stores'
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i8 undef, i8* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i8 undef, i8* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 undef, i16* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 undef, i16* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 undef, i32* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 undef, i32* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i64 undef, i64* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i64 undef, i64* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store i128 undef, i128* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store i128 undef, i128* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> undef, <4 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> undef, <4 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i64> undef, <4 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i64> undef, <4 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i32> undef, <8 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i32> undef, <8 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i64> undef, <8 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <8 x i64> undef, <8 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x float> undef, <3 x float>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x float> undef, <3 x float>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x double> undef, <3 x double>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x double> undef, <3 x double>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i32> undef, <3 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i32> undef, <3 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 12 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 12 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 25 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 28 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 31 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 54 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 56 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 59 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 117 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 8 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 120 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 122 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512-LABEL: 'stores'			; AVX512-LABEL: 'stores'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i8 undef, i8* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i8 undef, i8* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 undef, i16* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i16 undef, i16* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 undef, i32* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i32 undef, i32* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i64 undef, i64* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store i64 undef, i64* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store i128 undef, i128* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store i128 undef, i128* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> undef, <4 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i16> undef, <4 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i32> undef, <4 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i64> undef, <4 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <4 x i64> undef, <4 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i16> undef, <8 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i32> undef, <8 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i32> undef, <8 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i64> undef, <8 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: store <8 x i64> undef, <8 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x float> undef, <3 x float>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x float> undef, <3 x float>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x double> undef, <3 x double>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x double> undef, <3 x double>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i32> undef, <3 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i32> undef, <3 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <3 x i64> undef, <3 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 11 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <5 x i32> undef, <5 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 13 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <5 x i64> undef, <5 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 10 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: store <5 x i16> undef, <5 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 12 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <6 x i16> undef, <6 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 4 for instruction: store <7 x i16> undef, <7 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 25 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <11 x i16> undef, <11 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 28 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <12 x i16> undef, <12 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 31 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <13 x i16> undef, <13 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 61 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: store <23 x i16> undef, <23 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 64 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <24 x i16> undef, <24 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 67 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <25 x i16> undef, <25 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 125 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 7 for instruction: store <47 x i16> undef, <47 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 128 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: store <48 x i16> undef, <48 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 131 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: store <49 x i16> undef, <49 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	store i8 undef, i8* undef, align 4			store i8 undef, i8* undef, align 4
	store i16 undef, i16* undef, align 4			store i16 undef, i16* undef, align 4
	store i32 undef, i32* undef, align 4			store i32 undef, i32* undef, align 4
	store i64 undef, i64* undef, align 4			store i64 undef, i64* undef, align 4
	store i128 undef, i128* undef, align 4			store i128 undef, i128* undef, align 4

	Show All 40 Lines
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = load i64, i64* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = load i64, i64* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = load i128, i128* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = load i128, i128* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load <2 x i32>, <2 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load <2 x i32>, <2 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load <4 x i32>, <4 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load <4 x i32>, <4 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = load <8 x i32>, <8 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %8 = load <8 x i32>, <8 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = load <2 x i64>, <2 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = load <2 x i64>, <2 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = load <4 x i64>, <4 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %10 = load <4 x i64>, <4 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = load <8 x i64>, <8 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %11 = load <8 x i64>, <8 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %12 = load <3 x float>, <3 x float>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %12 = load <3 x float>, <3 x float>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %13 = load <3 x double>, <3 x double>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %13 = load <3 x double>, <3 x double>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %14 = load <3 x i32>, <3 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %14 = load <3 x i32>, <3 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %15 = load <3 x i64>, <3 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %15 = load <3 x i64>, <3 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %16 = load <5 x i32>, <5 x i32>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %16 = load <5 x i32>, <5 x i32>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %17 = load <5 x i64>, <5 x i64>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %17 = load <5 x i64>, <5 x i64>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %18 = load <5 x i16>, <5 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %18 = load <5 x i16>, <5 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %19 = load <6 x i16>, <6 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = load <6 x i16>, <6 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %20 = load <7 x i16>, <7 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %20 = load <7 x i16>, <7 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %21 = load <11 x i16>, <11 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %21 = load <11 x i16>, <11 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %22 = load <12 x i16>, <12 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = load <12 x i16>, <12 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 26 for instruction: %23 = load <13 x i16>, <13 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %23 = load <13 x i16>, <13 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 46 for instruction: %24 = load <23 x i16>, <23 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %24 = load <23 x i16>, <23 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %25 = load <24 x i16>, <24 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %25 = load <24 x i16>, <24 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %26 = load <25 x i16>, <25 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %26 = load <25 x i16>, <25 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 94 for instruction: %27 = load <47 x i16>, <47 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %27 = load <47 x i16>, <47 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %28 = load <48 x i16>, <48 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %28 = load <48 x i16>, <48 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 98 for instruction: %29 = load <49 x i16>, <49 x i16>* undef, align 4			; SSE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %29 = load <49 x i16>, <49 x i16>* undef, align 4
	; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; SSE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX-LABEL: 'loads'			; AVX-LABEL: 'loads'
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = load i8, i8* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = load i8, i8* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = load i16, i16* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = load i16, i16* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = load i32, i32* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = load i32, i32* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = load i64, i64* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = load i64, i64* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = load i128, i128* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = load i128, i128* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load <2 x i32>, <2 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load <2 x i32>, <2 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load <4 x i32>, <4 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load <4 x i32>, <4 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %8 = load <8 x i32>, <8 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %8 = load <8 x i32>, <8 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = load <2 x i64>, <2 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = load <2 x i64>, <2 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = load <4 x i64>, <4 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = load <4 x i64>, <4 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = load <8 x i64>, <8 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %11 = load <8 x i64>, <8 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %12 = load <3 x float>, <3 x float>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %12 = load <3 x float>, <3 x float>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %13 = load <3 x double>, <3 x double>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %13 = load <3 x double>, <3 x double>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %14 = load <3 x i32>, <3 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %14 = load <3 x i32>, <3 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %15 = load <3 x i64>, <3 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %15 = load <3 x i64>, <3 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %16 = load <5 x i32>, <5 x i32>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %16 = load <5 x i32>, <5 x i32>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %17 = load <5 x i64>, <5 x i64>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %17 = load <5 x i64>, <5 x i64>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %18 = load <5 x i16>, <5 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %18 = load <5 x i16>, <5 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %19 = load <6 x i16>, <6 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = load <6 x i16>, <6 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %20 = load <7 x i16>, <7 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %20 = load <7 x i16>, <7 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %21 = load <11 x i16>, <11 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %21 = load <11 x i16>, <11 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %22 = load <12 x i16>, <12 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = load <12 x i16>, <12 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %23 = load <13 x i16>, <13 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %23 = load <13 x i16>, <13 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 49 for instruction: %24 = load <23 x i16>, <23 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %24 = load <23 x i16>, <23 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %25 = load <24 x i16>, <24 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %25 = load <24 x i16>, <24 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 55 for instruction: %26 = load <25 x i16>, <25 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %26 = load <25 x i16>, <25 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 101 for instruction: %27 = load <47 x i16>, <47 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %27 = load <47 x i16>, <47 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 102 for instruction: %28 = load <48 x i16>, <48 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %28 = load <48 x i16>, <48 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 105 for instruction: %29 = load <49 x i16>, <49 x i16>* undef, align 4			; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %29 = load <49 x i16>, <49 x i16>* undef, align 4
	; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; AVX512-LABEL: 'loads'			; AVX512-LABEL: 'loads'
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = load i8, i8* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %1 = load i8, i8* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = load i16, i16* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = load i16, i16* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = load i32, i32* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = load i32, i32* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = load i64, i64* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = load i64, i64* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = load i128, i128* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = load i128, i128* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load <2 x i32>, <2 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = load <2 x i32>, <2 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load <4 x i32>, <4 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = load <4 x i32>, <4 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %8 = load <8 x i32>, <8 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %8 = load <8 x i32>, <8 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = load <2 x i64>, <2 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %9 = load <2 x i64>, <2 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = load <4 x i64>, <4 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %10 = load <4 x i64>, <4 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = load <8 x i64>, <8 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %11 = load <8 x i64>, <8 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %12 = load <3 x float>, <3 x float>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %12 = load <3 x float>, <3 x float>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %13 = load <3 x double>, <3 x double>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %13 = load <3 x double>, <3 x double>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %14 = load <3 x i32>, <3 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %14 = load <3 x i32>, <3 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %15 = load <3 x i64>, <3 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %15 = load <3 x i64>, <3 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %16 = load <5 x i32>, <5 x i32>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %16 = load <5 x i32>, <5 x i32>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %17 = load <5 x i64>, <5 x i64>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %17 = load <5 x i64>, <5 x i64>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %18 = load <5 x i16>, <5 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %18 = load <5 x i16>, <5 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %19 = load <6 x i16>, <6 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %19 = load <6 x i16>, <6 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %20 = load <7 x i16>, <7 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %20 = load <7 x i16>, <7 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 25 for instruction: %21 = load <11 x i16>, <11 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %21 = load <11 x i16>, <11 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 27 for instruction: %22 = load <12 x i16>, <12 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %22 = load <12 x i16>, <12 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 29 for instruction: %23 = load <13 x i16>, <13 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %23 = load <13 x i16>, <13 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 50 for instruction: %24 = load <23 x i16>, <23 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %24 = load <23 x i16>, <23 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 51 for instruction: %25 = load <24 x i16>, <24 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %25 = load <24 x i16>, <24 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 55 for instruction: %26 = load <25 x i16>, <25 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %26 = load <25 x i16>, <25 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 101 for instruction: %27 = load <47 x i16>, <47 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %27 = load <47 x i16>, <47 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 102 for instruction: %28 = load <48 x i16>, <48 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %28 = load <48 x i16>, <48 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 106 for instruction: %29 = load <49 x i16>, <49 x i16>* undef, align 4			; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %29 = load <49 x i16>, <49 x i16>* undef, align 4
	; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	load i8, i8* undef, align 4			load i8, i8* undef, align 4
	load i16, i16* undef, align 4			load i16, i16* undef, align 4
	load i32, i32* undef, align 4			load i32, i32* undef, align 4
	load i64, i64* undef, align 4			load i64, i64* undef, align 4
	load i128, i128* undef, align 4			load i128, i128* undef, align 4

	Show All 34 Lines