This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Cost for a constant buildvector.
ClosedPublic

Authored by ABataev on Jun 2 2022, 7:56 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper

Commits

rG0e7ed32c7136: [SLP]Cost for a constant buildvector.

Summary

Usually, constant buildvector results in a vector load from a
constant/data pool.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,130 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,150 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,530 ms	x64 debian > Clang.Driver::fsanitize.c

Event Timeline

ABataev created this revision.Jun 2 2022, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:56 AM

Herald added subscribers: vporpo, StephenFan, frasercrmck and 21 others. · View Herald Transcript

ABataev requested review of this revision.Jun 2 2022, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:56 AM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B167520: Diff 433753.Jun 2 2022, 8:33 AM

Some unfortunate regressions

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	Regression https://github.com/llvm/llvm-project/issues/46327

vdmitrie added a subscriber: vdmitrie.Jun 2 2022, 8:49 AM

vdmitrie added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5908	This estimate should be bit more complicated. Here are the things that can additionally be considered: for scalar floating point ops a constant operand is normally loaded from memory too. if it is an operand of instruction that becomes immediate (like shift value) and is splat - cost is zero. for a scalar integer op a constant operand is typically an immediate, so this estimate works in most cases but there is an exception: 64 bits operations on a 32bits target. That should be taken into account too.

Address comments.

Herald added subscribers: jsji, luke957, pengfei, arichardson. · View Herald TranscriptJun 3 2022, 12:46 PM

ABataev added inline comments.Jun 3 2022, 12:51 PM

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	llvm-mca reports throughputs: For scalar code - 5.5 AVX vector - 8.0 AVX2 vector - 5.0 https://godbolt.org/z/rEc74dxza

xbolva00 added inline comments.Jun 3 2022, 1:03 PM

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	Ah, right, these checks are for avx.

Harbormaster completed remote builds in B167776: Diff 434112.Jun 3 2022, 1:57 PM

icost.ll1 KBDownload

Test case for collection.

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3788	Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3

In D126885#3557032, @vdmitrie wrote:

icost.ll1 KBDownload

Test case for collection.

mca shows that these 2 instructions has the same cost, so it actually doers not matter. Probably worth to add some other instructions, which can load params directly from memory for x86

vdmitrie added inline comments.Jun 3 2022, 2:23 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5912	Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root only but constants do not seed vtree. if alternate opcodes are for shl/shr but shift value is splat it is still can be immediate for both of them.
5931	drop it?

ABataev added inline comments.Jun 3 2022, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5912	If constants are reduced values in reduction ops. That's why there is a TODO above.
5931	What do you mean?

vdmitrie added inline comments.Jun 3 2022, 2:47 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5912	okay. Although I believe it is not SLP vectorizer job to do constant folding.
5931	Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802, but it is not used. LIne 5811 will subtract one defined at 5795.

ABataev added inline comments.Jun 3 2022, 2:50 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5931	Ah, yes, sure.

vdmitrie added inline comments.Jun 3 2022, 3:14 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	Isn't this interface already puts assumption that a constant is a legal immediate? I was trying to explore this too and I found that it does not seem to cover correctly 32bit target specifically for 64bit operations. Ideally we should have interface that tells whether immediate is a legal imm operand for a target but I have not found anything like that. One way to figure this (which I found -may be wrongful) is when condition DL->getTypeStoreSizeInBits(ScalarTy) > DL->getLargestLegalIntTypeSizeInBits() is true we cannot assume operand as a legal immediate.

ABataev added inline comments.Jun 3 2022, 3:24 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	I'll check it.

ABataev added inline comments.Jun 6 2022, 9:02 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5912	Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just add a new member function?

vdmitrie added inline comments.Jun 6 2022, 11:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5912	Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into inside the TTI interface does not look like right thing to do. But outlining this whole new code into a separate member is a good idea. What sounds weird for me is that constants may seed vtree for reduction. Although that is not directly related to this patch but you are placing here work arounds of that. IMO it is unpractical to run constants reduction through SLP vectorizer machinery. Probably, to make the work around of that issue simpler in this patch, add an early return: if (E->UserTreeIndices.empty()) return 0; Otherwise it will be returning memory-op cost for a foldable operation.

Address comments

Herald added subscribers: kbarton, nemanjai. · View Herald TranscriptJun 8 2022, 8:19 AM

ABataev added inline comments.Jun 8 2022, 8:20 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

5912

What sounds weird for me is that constants may seed vtree for reduction.

InstCombiner and other passes are not always able to handle them (or require some extra work and compile time). E.g:

define i32 @foo(i32 %v, i32 %a) {
  %s1 = add i32 %v, 1
  %s2 = add i32 %a, 2
  %s3 = add i32 %s1, %s2
  %s11 = add i32 %v, %a
  %s31 = add i32 %s3, %s11
  %s4 = add i32 %v, 3
  %s5 = add i32 %a, 4
  %s6 = add i32 %s4, %s5
  %s7 = add i32 %s31, %s6
  ret i32 %s7
}

SLP is able to transform it to:

define i32 @foo(i32 %v, i32 %a) {
  %1 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> <i32 4, i32 3, i32 2, i32 1>)
  %op.rdx = add i32 %a, %a
  %op.rdx1 = add i32 %a, %v
  %op.rdx2 = add i32 %v, %v
  %op.rdx3 = add i32 %op.rdx, %op.rdx1
  %op.rdx4 = add i32 %op.rdx3, %op.rdx2
  %op.rdx5 = add i32 %1, %op.rdx4
  ret i32 %op.rdx5
}

which can be optimized

%1 = i32 10

But I agree, that it requires improvement. We don't need to estimate the cost and emit reduction here. I have a patch that improves it. Need to work on it for some time, though.

Harbormaster completed remote builds in B168583: Diff 435177.Jun 8 2022, 10:16 AM

Rebase

Herald added a subscriber: nlopes. · View Herald TranscriptJun 14 2022, 1:41 PM

Harbormaster completed remote builds in B169816: Diff 436911.Jun 14 2022, 3:16 PM

RKSimon added inline comments.Jun 16 2022, 3:32 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3788	Yes, vector shifts must be splats without AVX2 or XOP
3808	Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops.

shchenz added a subscriber: shchenz.Jun 16 2022, 3:53 AM

ABataev added inline comments.Jun 16 2022, 10:17 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3808	I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector Add/Subs (0.2-0.33 vs ~0.5) We can add it later, currently no such kind of analysis in getIntImmCostInst

Address comments

Harbormaster completed remote builds in B170350: Diff 437675.Jun 16 2022, 2:32 PM

Ping!

RKSimon added inline comments.Jun 22 2022, 10:10 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3801	XOP can memory fold from Idx == 0 as well.

Removed getConstantBuildVectorCost, the analysis for constant values already exists in getArithmeticInstrCost. Added support for const operand for stores in getMemoryOpCost function.

Harbormaster completed remote builds in B171421: Diff 439156.Jun 22 2022, 3:09 PM

Rebase

Harbormaster completed remote builds in B171896: Diff 439821.Jun 24 2022, 11:47 AM

Rebase

Harbormaster completed remote builds in B172460: Diff 440594.Jun 28 2022, 7:13 AM

Rebase

Harbormaster completed remote builds in B180695: Diff 451880.Aug 11 2022, 10:35 AM

RKSimon added inline comments.Aug 15 2022, 4:15 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4085	Just do this once before the !VTy \|\| !LT.second.isVector()) check?

Address comment

Harbormaster completed remote builds in B181265: Diff 452648.Aug 15 2022, 7:30 AM

LGTM - it might be worth splitting the refactoring of adding the OperandValueKind arg to getMemoryOpCost? That way any fall out from the cost changes are more localised.

This revision is now accepted and ready to land.Aug 17 2022, 10:11 AM

In D126885#3729272, @RKSimon wrote:

LGTM - it might be worth splitting the refactoring of adding the OperandValueKind arg to getMemoryOpCost? That way any fall out from the cost changes are more localised.

Ok, will commit in a separate patch.

ABataev mentioned this in rGd53e245951f8: [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC..Aug 19 2022, 7:34 AM

This revision was landed with ongoing or failed builds.Aug 19 2022, 8:04 AM

Closed by commit rG0e7ed32c7136: [SLP]Cost for a constant buildvector. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG0e7ed32c7136: [SLP]Cost for a constant buildvector..

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.
Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

Also, to be clear, I've accepted that this patch is reasonable. I'm asking now about future direction for my own work, not asking for you to volunteer for any of the above. :)

In D126885#3735638, @reames wrote:

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.

Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

I would do both (in some way) as a first step. Introduce getConstBuildVectorInstrCost (local to RiscV TTI interfac) and use it in TTI functions (I mean in getArithmeticInstrCost, getMemoryOpCost, etc.) for better constant build vector cost estimation (if the user provides operands or OperandValueProperties). Later we can make it public for all TTI interfaces. Thoughts?

Also, to be clear, I've accepted that this patch is reasonable. I'm asking now about future direction for my own work, not asking for you to volunteer for any of the above. :)

I understand, no problem.

Refactoring OperandValueKind/Properties from enums into a single properties list has come up several times (IIRC KnownNeverZero/KnownNeverNegative properties and even KnownBits/SignBits/Min+Max have been mentioned as useful for some cases).

TBH just merging them as an initial cleanup (and improving TargetTransformInfo::getOperandInfo) would be worth it and would make it easier for future changes.

In D126885#3735695, @ABataev wrote:

In D126885#3735638, @reames wrote:

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.

Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

I would do both (in some way) as a first step. Introduce getConstBuildVectorInstrCost (local to RiscV TTI interfac) and use it in TTI functions (I mean in getArithmeticInstrCost, getMemoryOpCost, etc.) for better constant build vector cost estimation (if the user provides operands or OperandValueProperties). Later we can make it public for all TTI interfaces. Thoughts?

I split out the costing code in 59960e8d with plans to extend it. However, this isn't quite the same as getConstBuildVectorInstrCost as we don't have the actual values forming the build vector. For that, we'd need to significantly change the interface of TTI to pass through all of the Values making up the build vector.

In D126885#3735736, @RKSimon wrote:

Refactoring OperandValueKind/Properties from enums into a single properties list has come up several times (IIRC KnownNeverZero/KnownNeverNegative properties and even KnownBits/SignBits/Min+Max have been mentioned as useful for some cases).

TBH just merging them as an initial cleanup (and improving TargetTransformInfo::getOperandInfo) would be worth it and would make it easier for future changes.

Given this has come up many times, I went ahead and did it. Changes to wrap both existing properties enum in a class have been plumbed through all of TTI and client code. If we want to add new properties, it should be pretty straight forward to do so.

I'm still not sure this is the right direction overall - as opposed to costing the actual constant value - but I'm still toying with ideas here. One thing I did notice is that basically only X86 costs immediate operands with the current approach. So this really isn't "existing targets do X"; it's "most targets ignore this issue, and X86 does X".

FYI, this change causes regression with Flang due to store-forwarding issues. I am not sure if it is Flang-specific - please take a look: https://github.com/llvm/llvm-project/issues/57322

reames mentioned this in D132566: [SLP] Fix cost model w.r.t. operand properties.Aug 24 2022, 8:12 AM

reames mentioned this in D132680: [RISCV] Disable SLP vectorization by default due to unresolved profitability issues.Aug 25 2022, 10:48 AM

FYI this change caused a noticeable compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=31fbcccb3136b9da99e7bc95007e553403fcd641&to=0e7ed32c71362f3547329c6ee8573a8bc191f58a&stat=instructions Highest impact seems to be 7.5% on constants.c from mafft. I don't see anything obvious that can be optimized here though.

In D126885#3755038, @nikic wrote:

FYI this change caused a noticeable compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=31fbcccb3136b9da99e7bc95007e553403fcd641&to=0e7ed32c71362f3547329c6ee8573a8bc191f58a&stat=instructions Highest impact seems to be 7.5% on constants.c from mafft. I don't see anything obvious that can be optimized here though.

Hope D132750 will fix it in some cases for FP cases.

reames mentioned this in rG42ef5720493e: [SLP] Fix cost model w.r.t. operand properties.Sep 23 2022, 8:40 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

15 lines

TargetTransformInfoImpl.h

11 lines

CodeGen/

BasicTTIImpl.h

9 lines

lib/

Analysis/

TargetTransformInfo.cpp

7 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

9 lines

AArch64TargetTransformInfo.cpp

4 lines

ARM/

ARMTargetTransformInfo.h

9 lines

ARMTargetTransformInfo.cpp

3 lines

Hexagon/

HexagonTargetTransformInfo.h

9 lines

HexagonTargetTransformInfo.cpp

7 lines

PowerPC/

PPCTargetTransformInfo.h

9 lines

PPCTargetTransformInfo.cpp

1 line

RISCV/

RISCVTargetTransformInfo.h

6 lines

RISCVTargetTransformInfo.cpp

26 lines

SystemZ/

SystemZTargetTransformInfo.h

9 lines

SystemZTargetTransformInfo.cpp

1 line

X86/

X86TargetTransformInfo.h

9 lines

X86TargetTransformInfo.cpp

22 lines

Transforms/

Vectorize/

LoopVectorize.cpp

13 lines

SLPVectorizer.cpp

49 lines

test/

Analysis/

CostModel/

X86/

arith-fp.ll

78 lines

Transforms/

SLPVectorizer/

RISCV/

rvv-min-vector-size.ll

15 lines

X86/

crash_bullet.ll

13 lines

Diff 439821

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,163 Lines • ▼ Show 20 Lines	InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

/// \return The cost of Load and Store instructions.		/// \return The cost of Load and Store instructions.
InstructionCost		InstructionCost
getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
		OperandValueKind OpdInfo = OK_AnyValue,
const Instruction *I = nullptr) const;		const Instruction *I = nullptr) const;

/// \return The cost of VP Load and Store instructions.		/// \return The cost of VP Load and Store instructions.
InstructionCost		InstructionCost
getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
const Instruction *I = nullptr) const;		const Instruction *I = nullptr) const;
▲ Show 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	public:
virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) = 0;		unsigned Index) = 0;

virtual InstructionCost		virtual InstructionCost
getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,		getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;

virtual InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		virtual InstructionCost
Align Alignment,		getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
TTI::TargetCostKind CostKind,		OperandValueKind OpdInfo, const Instruction *I) = 0;
const Instruction *I) = 0;
virtual InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src,		virtual InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment,		Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual InstructionCost		virtual InstructionCost
getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
▲ Show 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) override {		TTI::TargetCostKind CostKind) override {
return Impl.getReplicationShuffleCost(EltTy, ReplicationFactor, VF,		return Impl.getReplicationShuffleCost(EltTy, ReplicationFactor, VF,
DemandedDstElts, CostKind);		DemandedDstElts, CostKind);
}		}
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		OperandValueKind OpdInfo,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return Impl.getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, CostKind,
CostKind, I);		OpdInfo, I);
}		}
InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getVPMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return Impl.getVPMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind, I);		CostKind, I);
}		}
▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 574 Lines • ▼ Show 20 Lines	unsigned getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
return 1;		return 1;
}		}

InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) const {		const Instruction *I) const {
return 1;		return 1;
}		}

InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getVPMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) const {		const Instruction *I) const {
▲ Show 20 Lines • Show All 478 Lines • ▼ Show 20 Lines	InstructionCost getUserCost(const User U, ArrayRef<const Value > Operands,
case Instruction::AddrSpaceCast: {		case Instruction::AddrSpaceCast: {
Type *OpTy = U->getOperand(0)->getType();		Type *OpTy = U->getOperand(0)->getType();
return TargetTTI->getCastInstrCost(		return TargetTTI->getCastInstrCost(
Opcode, Ty, OpTy, TTI::getCastContextHint(I), CostKind, I);		Opcode, Ty, OpTy, TTI::getCastContextHint(I), CostKind, I);
}		}
case Instruction::Store: {		case Instruction::Store: {
auto *SI = cast<StoreInst>(U);		auto *SI = cast<StoreInst>(U);
Type *ValTy = U->getOperand(0)->getType();		Type *ValTy = U->getOperand(0)->getType();
		TTI::OperandValueProperties OpVP = TTI::OP_None;
		TTI::OperandValueKind OpVK = TTI::getOperandInfo(U->getOperand(0), OpVP);
return TargetTTI->getMemoryOpCost(Opcode, ValTy, SI->getAlign(),		return TargetTTI->getMemoryOpCost(Opcode, ValTy, SI->getAlign(),
SI->getPointerAddressSpace(),		SI->getPointerAddressSpace(), CostKind,
CostKind, I);		OpVK, I);
}		}
case Instruction::Load: {		case Instruction::Load: {
auto *LI = cast<LoadInst>(U);		auto *LI = cast<LoadInst>(U);
Type *LoadType = U->getType();		Type *LoadType = U->getType();
// If there is a non-register sized type, the cost estimation may expand		// If there is a non-register sized type, the cost estimation may expand
// it to be several instructions to load into multiple registers on the		// it to be several instructions to load into multiple registers on the
// target. But, if the only use of the load is a trunc instruction to a		// target. But, if the only use of the load is a trunc instruction to a
// register sized type, the instruction selector can combine these		// register sized type, the instruction selector can combine these
// instructions to be a single load. So, in this case, we use the		// instructions to be a single load. So, in this case, we use the
// destination type of the trunc instruction rather than the load to		// destination type of the trunc instruction rather than the load to
// accurately estimate the cost of this load instruction.		// accurately estimate the cost of this load instruction.
if (CostKind == TTI::TCK_CodeSize && LI->hasOneUse() &&		if (CostKind == TTI::TCK_CodeSize && LI->hasOneUse() &&
!LoadType->isVectorTy()) {		!LoadType->isVectorTy()) {
if (const TruncInst TI = dyn_cast<TruncInst>(LI->user_begin()))		if (const TruncInst TI = dyn_cast<TruncInst>(LI->user_begin()))
LoadType = TI->getDestTy();		LoadType = TI->getDestTy();
}		}
return TargetTTI->getMemoryOpCost(Opcode, LoadType, LI->getAlign(),		return TargetTTI->getMemoryOpCost(Opcode, LoadType, LI->getAlign(),
LI->getPointerAddressSpace(),		LI->getPointerAddressSpace(), CostKind,
CostKind, I);		TTI::OK_AnyValue, I);
}		}
case Instruction::Select: {		case Instruction::Select: {
const Value Op0, Op1;		const Value Op0, Op1;
if (match(U, m_LogicalAnd(m_Value(Op0), m_Value(Op1))) \|\|		if (match(U, m_LogicalAnd(m_Value(Op0), m_Value(Op1))) \|\|
match(U, m_LogicalOr(m_Value(Op0), m_Value(Op1)))) {		match(U, m_LogicalOr(m_Value(Op0), m_Value(Op1)))) {
// select x, y, false --> x & y		// select x, y, false --> x & y
// select x, true, y --> x \| y		// select x, true, y --> x \| y
TTI::OperandValueProperties Op1VP = TTI::OP_None;		TTI::OperandValueProperties Op1VP = TTI::OP_None;
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,181 Lines • ▼ Show 20 Lines	Cost += thisT()->getScalarizationOverhead(SrcVT, DemandedSrcElts,
/Extract/ true);		/Extract/ true);
Cost +=		Cost +=
thisT()->getScalarizationOverhead(ReplicatedVT, DemandedDstElts,		thisT()->getScalarizationOverhead(ReplicatedVT, DemandedDstElts,
/Insert/ true, /Extract/ false);		/Insert/ true, /Extract/ false);

return Cost;		return Cost;
}		}

InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr) {		const Instruction *I = nullptr) {
assert(!Src->isVoidTy() && "Invalid type");		assert(!Src->isVoidTy() && "Invalid type");
// Assume types, such as structs, are expensive.		// Assume types, such as structs, are expensive.
if (getTLI()->getValueType(DL, Src, true) == MVT::Other)		if (getTLI()->getValueType(DL, Src, true) == MVT::Other)
return 4;		return 4;
std::pair<InstructionCost, MVT> LT =		std::pair<InstructionCost, MVT> LT =
getTLI()->getTypeLegalizationCost(DL, Src);		getTLI()->getTypeLegalizationCost(DL, Src);

// Assuming that all loads of legal types cost 1.		// Assuming that all loads of legal types cost 1.
▲ Show 20 Lines • Show All 1,160 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 869 Lines • ▼ Show 20 Lines	InstructionCost TargetTransformInfo::getReplicationShuffleCost(
InstructionCost Cost = TTIImpl->getReplicationShuffleCost(		InstructionCost Cost = TTIImpl->getReplicationShuffleCost(
EltTy, ReplicationFactor, VF, DemandedDstElts, CostKind);		EltTy, ReplicationFactor, VF, DemandedDstElts, CostKind);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

InstructionCost TargetTransformInfo::getMemoryOpCost(		InstructionCost TargetTransformInfo::getMemoryOpCost(
unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,		unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind, const Instruction *I) const {		TTI::TargetCostKind CostKind, TTI::OperandValueKind OpdInfo,
		const Instruction *I) const {
assert((I == nullptr \|\| I->getOpcode() == Opcode) &&		assert((I == nullptr \|\| I->getOpcode() == Opcode) &&
"Opcode should reflect passed instruction.");		"Opcode should reflect passed instruction.");
InstructionCost Cost = TTIImpl->getMemoryOpCost(Opcode, Src, Alignment,		InstructionCost Cost = TTIImpl->getMemoryOpCost(
AddressSpace, CostKind, I);		Opcode, Src, Alignment, AddressSpace, CostKind, OpdInfo, I);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

InstructionCost TargetTransformInfo::getMaskedMemoryOpCost(		InstructionCost TargetTransformInfo::getMaskedMemoryOpCost(
unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,		unsigned Opcode, Type *Src, Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
InstructionCost Cost = TTIImpl->getMaskedMemoryOpCost(Opcode, Src, Alignment,		InstructionCost Cost = TTIImpl->getMaskedMemoryOpCost(Opcode, Src, Alignment,
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const;		bool IsZeroCmp) const;
bool useNeonVector(const Type *Ty) const;		bool useNeonVector(const Type *Ty) const;

InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

InstructionCost getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys);		InstructionCost getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys);

void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE);

void getPeelingPreferences(Loop *L, ScalarEvolution &SE,		void getPeelingPreferences(Loop *L, ScalarEvolution &SE,
▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,125 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
// it. This change will be removed when code-generation for these types is		// it. This change will be removed when code-generation for these types is
// sufficiently reliable.		// sufficiently reliable.
if (cast<VectorType>(DataTy)->getElementCount() ==		if (cast<VectorType>(DataTy)->getElementCount() ==
ElementCount::getScalable(1))		ElementCount::getScalable(1))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();

ElementCount LegalVF = LT.second.getVectorElementCount();		ElementCount LegalVF = LT.second.getVectorElementCount();
InstructionCost MemOpCost =		InstructionCost MemOpCost =
getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind, I);		getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind,
		TTI::OK_AnyValue, I);
// Add on an overhead cost for using gathers/scatters.		// Add on an overhead cost for using gathers/scatters.
// TODO: At the moment this is applied unilaterally for all CPUs, but at some		// TODO: At the moment this is applied unilaterally for all CPUs, but at some
// point we may want a per-CPU overhead.		// point we may want a per-CPU overhead.
MemOpCost *= getSVEGatherScatterOverhead(Opcode);		MemOpCost *= getSVEGatherScatterOverhead(Opcode);
return LT.first * MemOpCost * getMaxNumElements(LegalVF);		return LT.first * MemOpCost * getMaxNumElements(LegalVF);
}		}

bool AArch64TTIImpl::useNeonVector(const Type *Ty) const {		bool AArch64TTIImpl::useNeonVector(const Type *Ty) const {
return isa<FixedVectorType>(Ty) && !ST->useSVEForFixedLengthVectors();		return isa<FixedVectorType>(Ty) && !ST->useSVEForFixedLengthVectors();
}		}

InstructionCost AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Ty,		InstructionCost AArch64TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Ty,
MaybeAlign Alignment,		MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) {		const Instruction *I) {
EVT VT = TLI->getValueType(DL, Ty, true);		EVT VT = TLI->getValueType(DL, Ty, true);
// Type legalization can't handle structs		// Type legalization can't handle structs
if (VT == MVT::Other)		if (VT == MVT::Other)
return BaseT::getMemoryOpCost(Opcode, Ty, Alignment, AddressSpace,		return BaseT::getMemoryOpCost(Opcode, Ty, Alignment, AddressSpace,
CostKind);		CostKind);

auto LT = TLI->getTypeLegalizationCost(DL, Ty);		auto LT = TLI->getTypeLegalizationCost(DL, Ty);
▲ Show 20 Lines • Show All 754 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	InstructionCost getArithmeticInstrCost(
unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,		unsigned Opcode, Type *Ty, TTI::TargetCostKind CostKind,
TTI::OperandValueKind Op1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Op1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Op2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Op2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >(),		ArrayRef<const Value > Args = ArrayRef<const Value >(),
const Instruction *CxtI = nullptr);		const Instruction *CxtI = nullptr);

InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getInterleavedMemoryOpCost(		InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,445 Lines • ▼ Show 20 Lines	InstructionCost ARMTTIImpl::getArithmeticInstrCost(

return BaseCost;		return BaseCost;
}		}

InstructionCost ARMTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost ARMTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
MaybeAlign Alignment,		MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) {		const Instruction *I) {
// TODO: Handle other cost kinds.		// TODO: Handle other cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return 1;		return 1;

// Type legalization can't handle structs		// Type legalization can't handle structs
if (TLI->getValueType(DL, Src, true) == MVT::Other)		if (TLI->getValueType(DL, Src, true) == MVT::Other)
return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
Show All 23 Lines	if (SrcVTy->getNumElements() == 4 && SrcVTy->getScalarType()->isHalfTy() &&
DstTy->getScalarType()->isFloatTy())		DstTy->getScalarType()->isFloatTy())
return ST->getMVEVectorCostFactor(CostKind);		return ST->getMVEVectorCostFactor(CostKind);
}		}

int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy()		int BaseCost = ST->hasMVEIntegerOps() && Src->isVectorTy()
? ST->getMVEVectorCostFactor(CostKind)		? ST->getMVEVectorCostFactor(CostKind)
: 1;		: 1;
return BaseCost * BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseCost * BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind, I);		CostKind, OpdInfo, I);
}		}

InstructionCost		InstructionCost
ARMTTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		ARMTTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
if (ST->hasMVEIntegerOps()) {		if (ST->hasMVEIntegerOps()) {
if (Opcode == Instruction::Load && isLegalMaskedLoad(Src, Alignment))		if (Opcode == Instruction::Load && isLegalMaskedLoad(Src, Alignment))
▲ Show 20 Lines • Show All 886 Lines • Show Last 20 Lines

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	InstructionCost getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
ArrayRef<Type *> Tys);		ArrayRef<Type *> Tys);
InstructionCost getCallInstrCost(Function F, Type RetTy,		InstructionCost getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys,		ArrayRef<Type *> Tys,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getAddressComputationCost(Type Tp, ScalarEvolution SE,		InstructionCost getAddressComputationCost(Type Tp, ScalarEvolution SE,
const SCEV *S);		const SCEV *S);
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getShuffleCost(TTI::ShuffleKind Kind, Type *Tp,		InstructionCost getShuffleCost(TTI::ShuffleKind Kind, Type *Tp,
ArrayRef<int> Mask, int Index, Type *SubTp,		ArrayRef<int> Mask, int Index, Type *SubTp,
ArrayRef<const Value *> Args = None);		ArrayRef<const Value *> Args = None);
InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,		InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
const Value *Ptr, bool VariableMask,		const Value *Ptr, bool VariableMask,
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	InstructionCost HexagonTTIImpl::getAddressComputationCost(Type *Tp,
const SCEV *S) {		const SCEV *S) {
return 0;		return 0;
}		}

InstructionCost HexagonTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost HexagonTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
MaybeAlign Alignment,		MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) {		const Instruction *I) {
assert(Opcode == Instruction::Load \|\| Opcode == Instruction::Store);		assert(Opcode == Instruction::Load \|\| Opcode == Instruction::Store);
// TODO: Handle other cost kinds.		// TODO: Handle other cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return 1;		return 1;

if (Opcode == Instruction::Store)		if (Opcode == Instruction::Store)
return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind, I);		CostKind, OpdInfo, I);

if (Src->isVectorTy()) {		if (Src->isVectorTy()) {
VectorType *VecTy = cast<VectorType>(Src);		VectorType *VecTy = cast<VectorType>(Src);
unsigned VecWidth = VecTy->getPrimitiveSizeInBits().getFixedSize();		unsigned VecWidth = VecTy->getPrimitiveSizeInBits().getFixedSize();
if (useHVX() && ST.isTypeForHVX(VecTy)) {		if (useHVX() && ST.isTypeForHVX(VecTy)) {
unsigned RegWidth =		unsigned RegWidth =
getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector)		getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector)
.getFixedSize();		.getFixedSize();
Show All 23 Lines	if (Src->isVectorTy()) {
if (Alignment == Align(4) \|\| Alignment == Align(8))		if (Alignment == Align(4) \|\| Alignment == Align(8))
return Cost * NumLoads;		return Cost * NumLoads;
// Loads of less than 32 bits will need extra inserts to compose a vector.		// Loads of less than 32 bits will need extra inserts to compose a vector.
assert(BoundAlignment <= Align(8));		assert(BoundAlignment <= Align(8));
unsigned LogA = Log2(BoundAlignment);		unsigned LogA = Log2(BoundAlignment);
return (3 - LogA) * Cost * NumLoads;		return (3 - LogA) * Cost * NumLoads;
}		}

return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, CostKind,
CostKind, I);		OpdInfo, I);
}		}

InstructionCost		InstructionCost
HexagonTTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		HexagonTTIImpl::getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
return BaseT::getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseT::getMaskedMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind);		CostKind);
▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	public:
InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,		InstructionCost getCFInstrCost(unsigned Opcode, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index);		unsigned Index);
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getInterleavedMemoryOpCost(		InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);
InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
bool areTypesABICompatible(const Function Caller, const Function Callee,		bool areTypesABICompatible(const Function Caller, const Function Callee,
const ArrayRef<Type *> &Types) const;		const ArrayRef<Type *> &Types) const;
Show All 17 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,139 Lines • ▼ Show 20 Lines	InstructionCost PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,

return Cost;		return Cost;
}		}

InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
MaybeAlign Alignment,		MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) {		const Instruction *I) {

InstructionCost CostFactor = vectorCostAdjustmentFactor(Opcode, Src, nullptr);		InstructionCost CostFactor = vectorCostAdjustmentFactor(Opcode, Src, nullptr);
if (!CostFactor.isValid())		if (!CostFactor.isValid())
return InstructionCost::getMax();		return InstructionCost::getMax();

if (TLI->getValueType(DL, Src, true) == MVT::Other)		if (TLI->getValueType(DL, Src, true) == MVT::Other)
return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
bool IsUnsigned,		bool IsUnsigned,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
Optional<FastMathFlags> FMF,		Optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

		InstructionCost
		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
		const Instruction *I = nullptr);

bool isElementTypeLegalForScalableVector(Type *Ty) const {		bool isElementTypeLegalForScalableVector(Type *Ty) const {
return TLI->isLegalElementTypeForRVV(Ty);		return TLI->isLegalElementTypeForRVV(Ty);
}		}

bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) {		bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) {
if (!ST->hasVInstructions())		if (!ST->hasVInstructions())
return false;		return false;

▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	if ((Opcode == Instruction::Load &&
!isLegalMaskedScatter(DataTy, Align(Alignment))))		!isLegalMaskedScatter(DataTy, Align(Alignment))))
return BaseT::getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,		return BaseT::getGatherScatterOpCost(Opcode, DataTy, Ptr, VariableMask,
Alignment, CostKind, I);		Alignment, CostKind, I);

// Cost is proportional to the number of memory operations implied. For		// Cost is proportional to the number of memory operations implied. For
// scalable vectors, we use an upper bound on that number since we don't		// scalable vectors, we use an upper bound on that number since we don't
// know exactly what VL will be.		// know exactly what VL will be.
auto &VTy = *cast<VectorType>(DataTy);		auto &VTy = *cast<VectorType>(DataTy);
InstructionCost MemOpCost = getMemoryOpCost(Opcode, VTy.getElementType(),		InstructionCost MemOpCost =
Alignment, 0, CostKind, I);		getMemoryOpCost(Opcode, VTy.getElementType(), Alignment, 0, CostKind,
		TTI::OK_AnyValue, I);
if (isa<ScalableVectorType>(VTy)) {		if (isa<ScalableVectorType>(VTy)) {
const unsigned EltSize = DL.getTypeSizeInBits(VTy.getElementType());		const unsigned EltSize = DL.getTypeSizeInBits(VTy.getElementType());
const unsigned MinSize = DL.getTypeSizeInBits(&VTy).getKnownMinValue();		const unsigned MinSize = DL.getTypeSizeInBits(&VTy).getKnownMinValue();
const unsigned VectorBitsMax = ST->getRealMaxVLen();		const unsigned VectorBitsMax = ST->getRealMaxVLen();
const unsigned MaxVLMAX =		const unsigned MaxVLMAX =
RISCVTargetLowering::computeVLMAX(VectorBitsMax, EltSize, MinSize);		RISCVTargetLowering::computeVLMAX(VectorBitsMax, EltSize, MinSize);
return MaxVLMAX * MemOpCost;		return MaxVLMAX * MemOpCost;
}		}
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	RISCVTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
// IR Reduction is composed by two vmv and one rvv reduction instruction.		// IR Reduction is composed by two vmv and one rvv reduction instruction.
InstructionCost BaseCost = 2;		InstructionCost BaseCost = 2;
unsigned VL = cast<FixedVectorType>(Ty)->getNumElements();		unsigned VL = cast<FixedVectorType>(Ty)->getNumElements();
if (TTI::requiresOrderedReduction(FMF))		if (TTI::requiresOrderedReduction(FMF))
return (LT.first - 1) + BaseCost + VL;		return (LT.first - 1) + BaseCost + VL;
return (LT.first - 1) + BaseCost + Log2_32_Ceil(VL);		return (LT.first - 1) + BaseCost + Log2_32_Ceil(VL);
}		}

		InstructionCost RISCVTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
		MaybeAlign Alignment,
		unsigned AddressSpace,
		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
		const Instruction *I) {
		InstructionCost Cost = 0;
		if (Opcode == Instruction::Store && isa<VectorType>(Src) &&
		(OpdInfo == TTI::OK_UniformConstantValue \|\|
		OpdInfo == TTI::OK_NonUniformConstantValue)) {
		APInt PseudoAddr = APInt::getAllOnes(DL.getPointerSizeInBits());
		// Add a cost of address load + the cost of the vector load.
		Cost += RISCVMatInt::getIntMatCost(PseudoAddr, DL.getPointerSizeInBits(),
		getST()->getFeatureBits()) +
		getMemoryOpCost(Instruction::Load, Src, DL.getABITypeAlign(Src),
		/AddressSpace=/0, CostKind);
		}
		return Cost + BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
		CostKind, OpdInfo, I);
		}

void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// TODO: More tuning on benchmarks and metrics with changes as needed		// TODO: More tuning on benchmarks and metrics with changes as needed
// would apply to all settings below to enable performance.		// would apply to all settings below to enable performance.


if (ST->enableDefaultUnroll())		if (ST->enableDefaultUnroll())
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	InstructionCost getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index);		unsigned Index);
bool isFoldableLoad(const LoadInst Ld, const Instruction &FoldedValue);		bool isFoldableLoad(const LoadInst Ld, const Instruction &FoldedValue);
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

InstructionCost getInterleavedMemoryOpCost(		InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);

InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,		InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

Show First 20 Lines • Show All 1,103 Lines • ▼ Show 20 Lines	if (auto *CI = dyn_cast<CallInst>(I))
return true;		return true;
return false;		return false;
}		}

InstructionCost SystemZTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost SystemZTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
MaybeAlign Alignment,		MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) {		const Instruction *I) {
assert(!Src->isVoidTy() && "Invalid type");		assert(!Src->isVoidTy() && "Invalid type");

// TODO: Handle other cost kinds.		// TODO: Handle other cost kinds.
if (CostKind != TTI::TCK_RecipThroughput)		if (CostKind != TTI::TCK_RecipThroughput)
return 1;		return 1;

if (!Src->isVectorTy() && Opcode == Instruction::Load && I != nullptr) {		if (!Src->isVectorTy() && Opcode == Instruction::Load && I != nullptr) {
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index);		unsigned Index);
InstructionCost getScalarizationOverhead(VectorType *Ty,		InstructionCost getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert, bool Extract);		bool Insert, bool Extract);
InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,		InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,
int VF,		int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost
MaybeAlign Alignment, unsigned AddressSpace,		getMemoryOpCost(unsigned Opcode, Type *Src, MaybeAlign Alignment,
TTI::TargetCostKind CostKind,		unsigned AddressSpace, TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo = TTI::OK_AnyValue,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMaskedMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,		InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
const Value *Ptr, bool VariableMask,		const Value *Ptr, bool VariableMask,
Align Alignment,		Align Alignment,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I);		const Instruction *I);
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,073 Lines • ▼ Show 20 Lines	if (LT.second.isVector() && (ISD == ISD::SDIV \|\| ISD == ISD::SREM \|\|
ISD == ISD::UDIV \|\| ISD == ISD::UREM)) {		ISD == ISD::UDIV \|\| ISD == ISD::UREM)) {
InstructionCost ScalarCost = getArithmeticInstrCost(		InstructionCost ScalarCost = getArithmeticInstrCost(
Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info,		Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info,
TargetTransformInfo::OP_None, TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None, TargetTransformInfo::OP_None);
return 20 * LT.first * LT.second.getVectorNumElements() * ScalarCost;		return 20 * LT.first * LT.second.getVectorNumElements() * ScalarCost;
}		}

// Fallback to the default implementation.		// Fallback to the default implementation.
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info);		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info,
		Opd1PropInfo, Opd2PropInfo, Args, CxtI);
}		}

InstructionCost X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,		InstructionCost X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
VectorType *BaseTp,		VectorType *BaseTp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp,		VectorType *SubTp,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {
// 64-bit packed float vectors (v2f32) are widened to type v4f32.		// 64-bit packed float vectors (v2f32) are widened to type v4f32.
▲ Show 20 Lines • Show All 2,688 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert,		bool Insert,
bool Extract) {		bool Extract) {
assert(DemandedElts.getBitWidth() ==		assert(DemandedElts.getBitWidth() ==
cast<FixedVectorType>(Ty)->getNumElements() &&		cast<FixedVectorType>(Ty)->getNumElements() &&
"Vector size mismatch");		"Vector size mismatch");

std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);		std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty);
MVT MScalarTy = LT.second.getScalarType();		MVT MScalarTy = LT.second.getScalarType();
		vdmitrieUnsubmitted Not Done Reply Inline Actions Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3 vdmitrie: Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3
		RKSimonUnsubmitted Not Done Reply Inline Actions Yes, vector shifts must be splats without AVX2 or XOP RKSimon: Yes, vector shifts must be splats without AVX2 or XOP
unsigned SizeInBits = LT.second.getSizeInBits();		unsigned SizeInBits = LT.second.getSizeInBits();

InstructionCost Cost = 0;		InstructionCost Cost = 0;

// For insertions, a ISD::BUILD_VECTOR style vector initialization can be much		// For insertions, a ISD::BUILD_VECTOR style vector initialization can be much
// cheaper than an accumulation of ISD::INSERT_VECTOR_ELT.		// cheaper than an accumulation of ISD::INSERT_VECTOR_ELT.
if (Insert) {		if (Insert) {
if ((MScalarTy == MVT::i16 && ST->hasSSE2()) \|\|		if ((MScalarTy == MVT::i16 && ST->hasSSE2()) \|\|
(MScalarTy.isInteger() && ST->hasSSE41()) \|\|		(MScalarTy.isInteger() && ST->hasSSE41()) \|\|
(MScalarTy == MVT::f32 && ST->hasSSE41())) {		(MScalarTy == MVT::f32 && ST->hasSSE41())) {
// For types we can insert directly, insertion into 128-bit sub vectors is		// For types we can insert directly, insertion into 128-bit sub vectors is
// cheap, followed by a cheap chain of concatenations.		// cheap, followed by a cheap chain of concatenations.
if (SizeInBits <= 128) {		if (SizeInBits <= 128) {
		RKSimonUnsubmitted Not Done Reply Inline Actions XOP can memory fold from Idx == 0 as well. RKSimon: XOP can memory fold from Idx == 0 as well.
Cost +=		Cost +=
BaseT::getScalarizationOverhead(Ty, DemandedElts, Insert, false);		BaseT::getScalarizationOverhead(Ty, DemandedElts, Insert, false);
} else {		} else {
// In each 128-lane, if at least one index is demanded but not all		// In each 128-lane, if at least one index is demanded but not all
// indices are demanded and this 128-lane is not the first 128-lane of		// indices are demanded and this 128-lane is not the first 128-lane of
// the legalized-vector, then this 128-lane needs a extracti128; If in		// the legalized-vector, then this 128-lane needs a extracti128; If in
// each 128-lane, there is at least one demanded index, this 128-lane		// each 128-lane, there is at least one demanded index, this 128-lane
		RKSimonUnsubmitted Not Done Reply Inline Actions Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops. RKSimon: Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops.
		ABataevAuthorUnsubmitted Done Reply Inline Actions I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector Add/Subs (0.2-0.33 vs ~0.5) We can add it later, currently no such kind of analysis in getIntImmCostInst ABataev: 1. I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector…
// needs a inserti128.		// needs a inserti128.

// The following cases will help you build a better understanding:		// The following cases will help you build a better understanding:
// Assume we insert several elements into a v8i32 vector in avx2,		// Assume we insert several elements into a v8i32 vector in avx2,
// Case#1: inserting into 1th index needs vpinsrd + inserti128.		// Case#1: inserting into 1th index needs vpinsrd + inserti128.
// Case#2: inserting into 5th index needs extracti128 + vpinsrd +		// Case#2: inserting into 5th index needs extracti128 + vpinsrd +
// inserti128.		// inserti128.
// Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.		// Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	InstructionCost SingleShuffleCost =
/Mask=/None, /Index=/0, /SubTp=/nullptr);		/Mask=/None, /Index=/0, /SubTp=/nullptr);
return NumDstVectorsDemanded * SingleShuffleCost;		return NumDstVectorsDemanded * SingleShuffleCost;
}		}

InstructionCost X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
MaybeAlign Alignment,		MaybeAlign Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
		TTI::OperandValueKind OpdInfo,
const Instruction *I) {		const Instruction *I) {
// TODO: Handle other cost kinds.		// TODO: Handle other cost kinds.
if (CostKind != TTI::TCK_RecipThroughput) {		if (CostKind != TTI::TCK_RecipThroughput) {
if (auto *SI = dyn_cast_or_null<StoreInst>(I)) {		if (auto *SI = dyn_cast_or_null<StoreInst>(I)) {
// Store instruction with index and scale costs 2 Uops.		// Store instruction with index and scale costs 2 Uops.
// Check the preceding GEP to identify non-const indices.		// Check the preceding GEP to identify non-const indices.
if (auto *GEP = dyn_cast<GetElementPtrInst>(SI->getPointerOperand())) {		if (auto *GEP = dyn_cast<GetElementPtrInst>(SI->getPointerOperand())) {
if (!all_of(GEP->indices(), [](Value *V) { return isa<Constant>(V); }))		if (!all_of(GEP->indices(), [](Value *V) { return isa<Constant>(V); }))
Show All 12 Lines	InstructionCost X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,

// Legalize the type.		// Legalize the type.
std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Src);		std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Src);

auto *VTy = dyn_cast<FixedVectorType>(Src);		auto *VTy = dyn_cast<FixedVectorType>(Src);

// Handle the simple case of non-vectors.		// Handle the simple case of non-vectors.
// NOTE: this assumes that legalization never creates vector from scalars!		// NOTE: this assumes that legalization never creates vector from scalars!
if (!VTy \|\| !LT.second.isVector())		if (!VTy \|\| !LT.second.isVector()) {
		InstructionCost Cost = 0;
		if (Opcode == Instruction::Store && LT.second.isFloatingPoint() &&
		(OpdInfo == TTI::OK_UniformConstantValue \|\|
		OpdInfo == TTI::OK_NonUniformConstantValue))
		Cost += getMemoryOpCost(Instruction::Load, Src, DL.getABITypeAlign(Src),
		/AddressSpace=/0, CostKind);
// Each load/store unit costs 1.		// Each load/store unit costs 1.
return LT.first * 1;		return Cost + LT.first * 1;
		}

bool IsLoad = Opcode == Instruction::Load;		bool IsLoad = Opcode == Instruction::Load;

Type *EltTy = VTy->getElementType();		Type *EltTy = VTy->getElementType();

const int EltTyBits = DL.getTypeSizeInBits(EltTy);		const int EltTyBits = DL.getTypeSizeInBits(EltTy);

InstructionCost Cost = 0;		InstructionCost Cost = 0;

		// Add a cost for constant load to vector.
		if (Opcode == Instruction::Store &&
		(OpdInfo == TTI::OK_UniformConstantValue \|\|
		OpdInfo == TTI::OK_NonUniformConstantValue))
		Cost += getMemoryOpCost(Instruction::Load, Src, DL.getABITypeAlign(Src),
		/AddressSpace=/0, CostKind);
		RKSimonUnsubmitted Not Done Reply Inline Actions Just do this once before the !VTy \|\| !LT.second.isVector()) check? RKSimon: Just do this once before the !VTy \|\| !LT.second.isVector()) check?

// Source of truth: how many elements were there in the original IR vector?		// Source of truth: how many elements were there in the original IR vector?
const unsigned SrcNumElt = VTy->getNumElements();		const unsigned SrcNumElt = VTy->getNumElements();

// How far have we gotten?		// How far have we gotten?
int NumEltRemaining = SrcNumElt;		int NumEltRemaining = SrcNumElt;
// Note that we intentionally capture by-reference, NumEltRemaining changes.		// Note that we intentionally capture by-reference, NumEltRemaining changes.
auto NumEltDone = [&]() { return SrcNumElt - NumEltRemaining; };		auto NumEltDone = [&]() { return SrcNumElt - NumEltRemaining; };

▲ Show 20 Lines • Show All 1,900 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,387 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getConsecutiveMemOpCost(Instruction *I,
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);
int ConsecutiveStride = Legal->isConsecutivePtr(ValTy, Ptr);		int ConsecutiveStride = Legal->isConsecutivePtr(ValTy, Ptr);
enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;

assert((ConsecutiveStride == 1 \|\| ConsecutiveStride == -1) &&		assert((ConsecutiveStride == 1 \|\| ConsecutiveStride == -1) &&
"Stride should be 1 or -1 for consecutive memory access");		"Stride should be 1 or -1 for consecutive memory access");
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
InstructionCost Cost = 0;		InstructionCost Cost = 0;
if (Legal->isMaskRequired(I))		if (Legal->isMaskRequired(I)) {
Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,		Cost += TTI.getMaskedMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,
CostKind);		CostKind);
else		} else {
		TTI::OperandValueProperties OpVP = TTI::OP_None;
		TTI::OperandValueKind OpVK = TTI::getOperandInfo(I->getOperand(0), OpVP);
Cost += TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,		Cost += TTI.getMemoryOpCost(I->getOpcode(), VectorTy, Alignment, AS,
CostKind, I);		CostKind, OpVK, I);
		}

bool Reverse = ConsecutiveStride < 0;		bool Reverse = ConsecutiveStride < 0;
if (Reverse)		if (Reverse)
Cost +=		Cost +=
TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy, None, 0);		TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy, None, 0);
return Cost;		return Cost;
}		}

▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::getMemoryInstructionCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
// Calculate scalar cost only. Vectorization cost should be ready at this		// Calculate scalar cost only. Vectorization cost should be ready at this
// moment.		// moment.
if (VF.isScalar()) {		if (VF.isScalar()) {
Type *ValTy = getLoadStoreType(I);		Type *ValTy = getLoadStoreType(I);
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);

		TTI::OperandValueProperties OpVP = TTI::OP_None;
		TTI::OperandValueKind OpVK = TTI::getOperandInfo(I->getOperand(0), OpVP);
return TTI.getAddressComputationCost(ValTy) +		return TTI.getAddressComputationCost(ValTy) +
TTI.getMemoryOpCost(I->getOpcode(), ValTy, Alignment, AS,		TTI.getMemoryOpCost(I->getOpcode(), ValTy, Alignment, AS,
TTI::TCK_RecipThroughput, I);		TTI::TCK_RecipThroughput, OpVK, I);
}		}
return getWideningCost(I, VF);		return getWideningCost(I, VF);
}		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::getInstructionCost(Instruction *I,		LoopVectorizationCostModel::getInstructionCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
// If we know that this instruction will remain uniform, check the cost of		// If we know that this instruction will remain uniform, check the cost of
▲ Show 20 Lines • Show All 4,078 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,899 Lines • ▼ Show 20 Lines	if (E->State == TreeEntry::NeedToGather) {
if (Shuffle) {		if (Shuffle) {
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
if (ShuffleVectorInst::isIdentityMask(Mask)) {		if (ShuffleVectorInst::isIdentityMask(Mask)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.		// node. Cost is 0.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "		<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");		<< *VL.front() << ".\n");
		vdmitrieUnsubmitted Not Done Reply Inline Actions This estimate should be bit more complicated. Here are the things that can additionally be considered: for scalar floating point ops a constant operand is normally loaded from memory too. if it is an operand of instruction that becomes immediate (like shift value) and is splat - cost is zero. for a scalar integer op a constant operand is typically an immediate, so this estimate works in most cases but there is an exception: 64 bits operations on a 32bits target. That should be taken into account too. vdmitrie: This estimate should be bit more complicated. Here are the things that can additionally be…
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
GatherCost =		GatherCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);		FinalVecTy, E->ReuseShuffleIndices);
		vdmitrieUnsubmitted Not Done Reply Inline Actions Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root only but constants do not seed vtree. if alternate opcodes are for shl/shr but shift value is splat it is still can be immediate for both of them. vdmitrie: Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If constants are reduced values in reduction ops. That's why there is a TODO above. ABataev: 1. If constants are reduced values in reduction ops. 2. That's why there is a TODO above.
		vdmitrieUnsubmitted Not Done Reply Inline Actions okay. Although I believe it is not SLP vectorizer job to do constant folding. vdmitrie: okay. Although I believe it is not SLP vectorizer job to do constant folding.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just add a new member function? ABataev: Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into inside the TTI interface does not look like right thing to do. But outlining this whole new code into a separate member is a good idea. What sounds weird for me is that constants may seed vtree for reduction. Although that is not directly related to this patch but you are placing here work arounds of that. IMO it is unpractical to run constants reduction through SLP vectorizer machinery. Probably, to make the work around of that issue simpler in this patch, add an early return: if (E->UserTreeIndices.empty()) return 0; Otherwise it will be returning memory-op cost for a foldable operation. vdmitrie: Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into…
		ABataevAuthorUnsubmitted Done Reply Inline Actions What sounds weird for me is that constants may seed vtree for reduction. InstCombiner and other passes are not always able to handle them (or require some extra work and compile time). E.g: define i32 @foo(i32 %v, i32 %a) { %s1 = add i32 %v, 1 %s2 = add i32 %a, 2 %s3 = add i32 %s1, %s2 %s11 = add i32 %v, %a %s31 = add i32 %s3, %s11 %s4 = add i32 %v, 3 %s5 = add i32 %a, 4 %s6 = add i32 %s4, %s5 %s7 = add i32 %s31, %s6 ret i32 %s7 } SLP is able to transform it to: define i32 @foo(i32 %v, i32 %a) { %1 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> <i32 4, i32 3, i32 2, i32 1>) %op.rdx = add i32 %a, %a %op.rdx1 = add i32 %a, %v %op.rdx2 = add i32 %v, %v %op.rdx3 = add i32 %op.rdx, %op.rdx1 %op.rdx4 = add i32 %op.rdx3, %op.rdx2 %op.rdx5 = add i32 %1, %op.rdx4 ret i32 %op.rdx5 } which can be optimized %1 = i32 10 But I agree, that it requires improvement. We don't need to estimate the cost and emit reduction here. I have a patch that improves it. Need to work on it for some time, though. ABataev: > What sounds weird for me is that constants may seed vtree for reduction. InstCombiner and…
} else {		} else {
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()		LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle that starts with "		<< " entries for bundle that starts with "
<< *VL.front() << ".\n");		<< *VL.front() << ".\n");
// Detected that instead of gather we can emit a shuffle of single/two		// Detected that instead of gather we can emit a shuffle of single/two
// previously vectorized nodes. Add the cost of the permutation rather		// previously vectorized nodes. Add the cost of the permutation rather
// than gather.		// than gather.
::addMask(Mask, E->ReuseShuffleIndices);		::addMask(Mask, E->ReuseShuffleIndices);
GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);		GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);
}		}
return GatherCost;		return GatherCost;
}		}
if ((E->getOpcode() == Instruction::ExtractElement \|\|		if ((E->getOpcode() == Instruction::ExtractElement \|\|
all_of(E->Scalars,		all_of(E->Scalars,
[](Value *V) {		[](Value *V) {
return isa<ExtractElementInst, UndefValue>(V);		return isa<ExtractElementInst, UndefValue>(V);
})) &&		})) &&
allSameType(VL)) {		allSameType(VL)) {
// Check that gather of extractelements can be represented as just a		// Check that gather of extractelements can be represented as just a
		vdmitrieUnsubmitted Not Done Reply Inline Actions drop it? vdmitrie: drop it?
		ABataevAuthorUnsubmitted Done Reply Inline Actions What do you mean? ABataev: What do you mean?
		vdmitrieUnsubmitted Not Done Reply Inline Actions Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802, but it is not used. LIne 5811 will subtract one defined at 5795. vdmitrie: Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah, yes, sure. ABataev: Ah, yes, sure.
// shuffle of a single/two vectors the scalars are extracted from.		// shuffle of a single/two vectors the scalars are extracted from.
SmallVector<int> Mask;		SmallVector<int> Mask;
Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =		Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =
isFixedVectorShuffle(VL, Mask);		isFixedVectorShuffle(VL, Mask);
if (ShuffleKind) {		if (ShuffleKind) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions Isn't this interface already puts assumption that a constant is a legal immediate? I was trying to explore this too and I found that it does not seem to cover correctly 32bit target specifically for 64bit operations. Ideally we should have interface that tells whether immediate is a legal imm operand for a target but I have not found anything like that. One way to figure this (which I found -may be wrongful) is when condition DL->getTypeStoreSizeInBits(ScalarTy) > DL->getLargestLegalIntTypeSizeInBits() is true we cannot assume operand as a legal immediate. vdmitrie: Isn't this interface already puts assumption that a constant is a legal immediate? I was trying…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll check it. ABataev: I'll check it.
// Found the bunch of extractelement instructions that must be gathered		// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a		// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.		// single input vector or of 2 input vectors.
InstructionCost Cost =		InstructionCost Cost =
computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);		computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);
AdjustExtractsCost(Cost);		AdjustExtractsCost(Cost);
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (VL.size() > 2 && E->getOpcode() == Instruction::Load &&
GatherCost += getGatherCost(VL.slice(I, VF));		GatherCost += getGatherCost(VL.slice(I, VF));
}		}
// The cost for vectorized loads.		// The cost for vectorized loads.
InstructionCost ScalarsCost = 0;		InstructionCost ScalarsCost = 0;
for (Value *V : VectorizedLoads) {		for (Value *V : VectorizedLoads) {
auto *LI = cast<LoadInst>(V);		auto *LI = cast<LoadInst>(V);
ScalarsCost += TTI->getMemoryOpCost(		ScalarsCost += TTI->getMemoryOpCost(
Instruction::Load, LI->getType(), LI->getAlign(),		Instruction::Load, LI->getType(), LI->getAlign(),
LI->getPointerAddressSpace(), CostKind, LI);		LI->getPointerAddressSpace(), CostKind, TTI::OK_AnyValue, LI);
}		}
auto *LI = cast<LoadInst>(E->getMainOp());		auto *LI = cast<LoadInst>(E->getMainOp());
auto *LoadTy = FixedVectorType::get(LI->getType(), VF);		auto *LoadTy = FixedVectorType::get(LI->getType(), VF);
Align Alignment = LI->getAlign();		Align Alignment = LI->getAlign();
GatherCost +=		GatherCost += VectorizedCnt *
VectorizedCnt *
TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,		TTI->getMemoryOpCost(Instruction::Load, LoadTy, Alignment,
LI->getPointerAddressSpace(), CostKind, LI);		LI->getPointerAddressSpace(),
		CostKind, TTI::OK_AnyValue, LI);
GatherCost += ScatterVectorizeCnt *		GatherCost += ScatterVectorizeCnt *
TTI->getGatherScatterOpCost(		TTI->getGatherScatterOpCost(
Instruction::Load, LoadTy, LI->getPointerOperand(),		Instruction::Load, LoadTy, LI->getPointerOperand(),
/VariableMask=/false, Alignment, CostKind, LI);		/VariableMask=/false, Alignment, CostKind, LI);
if (NeedInsertSubvectorAnalysis) {		if (NeedInsertSubvectorAnalysis) {
// Add the cost for the subvectors insert.		// Add the cost for the subvectors insert.
for (int I = VF, E = VL.size(); I < E; I += VF)		for (int I = VF, E = VL.size(); I < E; I += VF)
GatherCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VecTy,		GatherCost += TTI->getShuffleCost(TTI::SK_InsertSubvector, VecTy,
▲ Show 20 Lines • Show All 343 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
SmallVector<const Value *, 4> Operands(VL0->operand_values());		SmallVector<const Value *, 4> Operands(VL0->operand_values());
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;		CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;
		for (unsigned I = 0, Num = VL0->getNumOperands(); I < Num; ++I) {
		if (all_of(VL, [I](Value *V) {
		return isConstant(cast<Instruction>(V)->getOperand(I));
		}))
		Operands[I] = ConstantVector::getNullValue(VecTy);
		}
InstructionCost VecCost =		InstructionCost VecCost =
TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
Show All 17 Lines	case Instruction::GetElementPtr: {
InstructionCost VecCost = TTI->getArithmeticInstrCost(		InstructionCost VecCost = TTI->getArithmeticInstrCost(
Instruction::Add, VecTy, CostKind, Op1VK, Op2VK);		Instruction::Add, VecTy, CostKind, Op1VK, Op2VK);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Cost of wide load - cost of scalar loads.		// Cost of wide load - cost of scalar loads.
Align Alignment = cast<LoadInst>(VL0)->getAlign();		Align Alignment = cast<LoadInst>(VL0)->getAlign();
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost ScalarEltCost =
Instruction::Load, ScalarTy, Alignment, 0, CostKind, VL0);		TTI->getMemoryOpCost(Instruction::Load, ScalarTy, Alignment, 0,
		CostKind, TTI::OK_AnyValue, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;		CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;
}		}
InstructionCost ScalarLdCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarLdCost = VecTy->getNumElements() * ScalarEltCost;
InstructionCost VecLdCost;		InstructionCost VecLdCost;
if (E->State == TreeEntry::Vectorize) {		if (E->State == TreeEntry::Vectorize) {
VecLdCost = TTI->getMemoryOpCost(Instruction::Load, VecTy, Alignment, 0,		VecLdCost = TTI->getMemoryOpCost(Instruction::Load, VecTy, Alignment, 0,
CostKind, VL0);		CostKind, TTI::OK_AnyValue, VL0);
} else {		} else {
assert(E->State == TreeEntry::ScatterVectorize && "Unknown EntryState");		assert(E->State == TreeEntry::ScatterVectorize && "Unknown EntryState");
Align CommonAlignment = Alignment;		Align CommonAlignment = Alignment;
for (Value *V : VL)		for (Value *V : VL)
CommonAlignment =		CommonAlignment =
commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());		commonAlignment(CommonAlignment, cast<LoadInst>(V)->getAlign());
VecLdCost = TTI->getGatherScatterOpCost(		VecLdCost = TTI->getGatherScatterOpCost(
Instruction::Load, VecTy, cast<LoadInst>(VL0)->getPointerOperand(),		Instruction::Load, VecTy, cast<LoadInst>(VL0)->getPointerOperand(),
/VariableMask=/false, CommonAlignment, CostKind, VL0);		/VariableMask=/false, CommonAlignment, CostKind, VL0);
}		}
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecLdCost, ScalarLdCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecLdCost, ScalarLdCost));
return CommonCost + VecLdCost - ScalarLdCost;		return CommonCost + VecLdCost - ScalarLdCost;
}		}
case Instruction::Store: {		case Instruction::Store: {
// We know that we can merge the stores. Calculate the cost.		// We know that we can merge the stores. Calculate the cost.
bool IsReorder = !E->ReorderIndices.empty();		bool IsReorder = !E->ReorderIndices.empty();
auto *SI =		auto *SI =
cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);		cast<StoreInst>(IsReorder ? VL[E->ReorderIndices.front()] : VL0);
Align Alignment = SI->getAlign();		Align Alignment = SI->getAlign();
		TTI::OperandValueProperties OpVP = TTI::OP_None;
		TTI::OperandValueKind OpVK = TTI::getOperandInfo(SI->getOperand(0), OpVP);
InstructionCost ScalarEltCost = TTI->getMemoryOpCost(		InstructionCost ScalarEltCost = TTI->getMemoryOpCost(
Instruction::Store, ScalarTy, Alignment, 0, CostKind, VL0);		Instruction::Store, ScalarTy, Alignment, 0, CostKind, OpVK, VL0);
InstructionCost ScalarStCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarStCost = VecTy->getNumElements() * ScalarEltCost;
		OpVK = TTI::OK_AnyValue;
		if (all_of(E->Scalars,
		[](Value *V) {
		return isConstant(cast<Instruction>(V)->getOperand(0));
		}) &&
		any_of(E->Scalars, [](Value *V) {
		Value *Op = cast<Instruction>(V)->getOperand(0);
		return !isa<UndefValue>(Op) && !cast<Constant>(Op)->isZeroValue();
		}))
		OpVK = TTI::OK_NonUniformConstantValue;
InstructionCost VecStCost = TTI->getMemoryOpCost(		InstructionCost VecStCost = TTI->getMemoryOpCost(
Instruction::Store, VecTy, Alignment, 0, CostKind, VL0);		Instruction::Store, VecTy, Alignment, 0, CostKind, OpVK, VL0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecStCost, ScalarStCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecStCost, ScalarStCost));
return CommonCost + VecStCost - ScalarStCost;		return CommonCost + VecStCost - ScalarStCost;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls.		// Calculate the cost of the scalar and vector calls.
IntrinsicCostAttributes CostAttrs(ID, *CI, 1);		IntrinsicCostAttributes CostAttrs(ID, *CI, 1);
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
▲ Show 20 Lines • Show All 5,977 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/arith-fp.ll

Show First 20 Lines • Show All 623 Lines • ▼ Show 20 Lines	;
%V8F64 = fdiv <8 x double> undef, undef		%V8F64 = fdiv <8 x double> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @frem(i32 %arg) {		define i32 @frem(i32 %arg) {
; SSE1-LABEL: 'frem'		; SSE1-LABEL: 'frem'
; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = frem <2 x double> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = frem <2 x double> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4F64 = frem <4 x double> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4F64 = frem <4 x double> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8F64 = frem <8 x double> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8F64 = frem <8 x double> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; SSE2-LABEL: 'frem'		; SSE2-LABEL: 'frem'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; SSE42-LABEL: 'frem'		; SSE42-LABEL: 'frem'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; AVX-LABEL: 'frem'		; AVX-LABEL: 'frem'
; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V8F32 = frem <8 x float> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F32 = frem <8 x float> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %V16F32 = frem <16 x float> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V16F32 = frem <16 x float> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = frem <4 x double> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F64 = frem <4 x double> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F64 = frem <8 x double> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F64 = frem <8 x double> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; AVX512-LABEL: 'frem'		; AVX512-LABEL: 'frem'
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V8F32 = frem <8 x float> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F32 = frem <8 x float> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 63 for instruction: %V16F32 = frem <16 x float> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V16F32 = frem <16 x float> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = frem <4 x double> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F64 = frem <4 x double> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8F64 = frem <8 x double> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %V8F64 = frem <8 x double> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; SLM-LABEL: 'frem'		; SLM-LABEL: 'frem'
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; GLM-LABEL: 'frem'		; GLM-LABEL: 'frem'
; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%F32 = frem float undef, undef		%F32 = frem float undef, undef
%V4F32 = frem <4 x float> undef, undef		%V4F32 = frem <4 x float> undef, undef
%V8F32 = frem <8 x float> undef, undef		%V8F32 = frem <8 x float> undef, undef
%V16F32 = frem <16 x float> undef, undef		%V16F32 = frem <16 x float> undef, undef

%F64 = frem double undef, undef		%F64 = frem double undef, undef
▲ Show 20 Lines • Show All 421 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/RISCV/rvv-min-vector-size.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=128 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-128		; RUN: -riscv-v-vector-bits-min=128 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-128
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=256 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-256		; RUN: -riscv-v-vector-bits-min=256 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-256
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=512 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-512		; RUN: -riscv-v-vector-bits-min=512 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-512

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"		target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
target triple = "riscv64"		target triple = "riscv64"

define void @foo(i64* nocapture writeonly %da) {		define void @foo(i64* nocapture writeonly %da) {
; CHECK-128-LABEL: @foo(		; CHECK-128-LABEL: @foo(
; CHECK-128-NEXT: entry:		; CHECK-128-NEXT: entry:
; CHECK-128-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <2 x i64>		; CHECK-128-NEXT: store i64 0, i64* [[DA:%.*]], align 8
; CHECK-128-NEXT: store <2 x i64> <i64 0, i64 1>, <2 x i64>* [[TMP0]], align 8		; CHECK-128-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 1
		; CHECK-128-NEXT: store i64 1, i64* [[ARRAYIDX1]], align 8
; CHECK-128-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 2		; CHECK-128-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 2
; CHECK-128-NEXT: [[TMP1:%.]] = bitcast i64 [[ARRAYIDX2]] to <2 x i64>*		; CHECK-128-NEXT: store i64 2, i64* [[ARRAYIDX2]], align 8
; CHECK-128-NEXT: store <2 x i64> <i64 2, i64 3>, <2 x i64>* [[TMP1]], align 8		; CHECK-128-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 3
		; CHECK-128-NEXT: store i64 3, i64* [[ARRAYIDX3]], align 8
; CHECK-128-NEXT: ret void		; CHECK-128-NEXT: ret void
;		;
; CHECK-256-LABEL: @foo(		; CHECK-256-LABEL: @foo(
; CHECK-256-NEXT: entry:		; CHECK-256-NEXT: entry:
; CHECK-256-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <4 x i64>		; CHECK-256-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <4 x i64>
; CHECK-256-NEXT: store <4 x i64> <i64 0, i64 1, i64 2, i64 3>, <4 x i64>* [[TMP0]], align 8		; CHECK-256-NEXT: store <4 x i64> <i64 0, i64 1, i64 2, i64 3>, <4 x i64>* [[TMP0]], align 8
; CHECK-256-NEXT: ret void		; CHECK-256-NEXT: ret void
;		;
Show All 12 Lines	entry:
%arrayidx3 = getelementptr inbounds i64, i64* %da, i64 3		%arrayidx3 = getelementptr inbounds i64, i64* %da, i64 3
store i64 3, i64* %arrayidx3, align 8		store i64 3, i64* %arrayidx3, align 8
ret void		ret void
}		}

define void @foo8(i8* nocapture writeonly %da) {		define void @foo8(i8* nocapture writeonly %da) {
; CHECK-LABEL: @foo8(		; CHECK-LABEL: @foo8(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[DA:%.]] to <2 x i8>		; CHECK-NEXT: store i8 0, i8* [[DA:%.*]], align 8
; CHECK-NEXT: store <2 x i8> <i8 0, i8 1>, <2 x i8>* [[TMP0]], align 8		; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 1
		; CHECK-NEXT: store i8 1, i8* [[ARRAYIDX1]], align 8
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 2		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
store i8 0, i8* %da, align 8		store i8 0, i8* %da, align 8
%arrayidx1 = getelementptr inbounds i8, i8* %da, i8 1		%arrayidx1 = getelementptr inbounds i8, i8* %da, i8 1
store i8 1, i8* %arrayidx1, align 8		store i8 1, i8* %arrayidx1, align 8
%arrayidx2 = getelementptr inbounds i8, i8* %da, i8 2		%arrayidx2 = getelementptr inbounds i8, i8* %da, i8 2
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }			%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }

	define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {			define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {
	; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(			; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0			; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0
				; CHECK-NEXT: [[NUB5:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO]], i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]
	; CHECK: land.lhs.true.i.1:			; CHECK: land.lhs.true.i.1:
	; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]			; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]
	; CHECK: if.then7.1:			; CHECK: if.then7.1:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*			; CHECK-NEXT: store i32 1, i32* [[M_NUMCONSTRAINTROWS4]], align 4
	; CHECK-NEXT: store <2 x i32> <i32 1, i32 5>, <2 x i32>* [[TMP0]], align 4			; CHECK-NEXT: store i32 5, i32* [[NUB5]], align 4
	; CHECK-NEXT: br label [[FOR_INC_1]]			; CHECK-NEXT: br label [[FOR_INC_1]]
	; CHECK: for.inc.1:			; CHECK: for.inc.1:
	; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 1, i32 -1>			; CHECK-NEXT: [[TMP1:%.*]] = add nsw <2 x i32> [[TMP0]], <i32 1, i32 -1>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP1]], <2 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br i1 undef, label %if.else, label %if.then			br i1 undef, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	ret void			ret void

	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Cost for a constant buildvector.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 439821

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.h

llvm/lib/Target/Hexagon/HexagonTargetTransformInfo.cpp

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Analysis/CostModel/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/RISCV/rvv-min-vector-size.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

[SLP]Cost for a constant buildvector.
ClosedPublic