This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Cost for a constant buildvector.
ClosedPublic

Authored by ABataev on Jun 2 2022, 7:56 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper

Commits

rG0e7ed32c7136: [SLP]Cost for a constant buildvector.

Summary

Usually, constant buildvector results in a vector load from a
constant/data pool.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Jun 2 2022, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:56 AM

Herald added subscribers: vporpo, StephenFan, frasercrmck and 21 others. · View Herald Transcript

ABataev requested review of this revision.Jun 2 2022, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:56 AM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B167520: Diff 433753.Jun 2 2022, 8:33 AM

Some unfortunate regressions

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	Regression https://github.com/llvm/llvm-project/issues/46327

vdmitrie added a subscriber: vdmitrie.Jun 2 2022, 8:49 AM

vdmitrie added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5932	This estimate should be bit more complicated. Here are the things that can additionally be considered: for scalar floating point ops a constant operand is normally loaded from memory too. if it is an operand of instruction that becomes immediate (like shift value) and is splat - cost is zero. for a scalar integer op a constant operand is typically an immediate, so this estimate works in most cases but there is an exception: 64 bits operations on a 32bits target. That should be taken into account too.

Address comments.

Herald added subscribers: jsji, luke957, pengfei, arichardson. · View Herald TranscriptJun 3 2022, 12:46 PM

ABataev added inline comments.Jun 3 2022, 12:51 PM

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	llvm-mca reports throughputs: For scalar code - 5.5 AVX vector - 8.0 AVX2 vector - 5.0 https://godbolt.org/z/rEc74dxza

xbolva00 added inline comments.Jun 3 2022, 1:03 PM

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	Ah, right, these checks are for avx.

Harbormaster completed remote builds in B167776: Diff 434112.Jun 3 2022, 1:57 PM

icost.ll1 KBDownload

Test case for collection.

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3810	Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3

In D126885#3557032, @vdmitrie wrote:

icost.ll1 KBDownload

Test case for collection.

mca shows that these 2 instructions has the same cost, so it actually doers not matter. Probably worth to add some other instructions, which can load params directly from memory for x86

vdmitrie added inline comments.Jun 3 2022, 2:23 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root only but constants do not seed vtree. if alternate opcodes are for shl/shr but shift value is splat it is still can be immediate for both of them.
5955	drop it?

ABataev added inline comments.Jun 3 2022, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	If constants are reduced values in reduction ops. That's why there is a TODO above.
5955	What do you mean?

vdmitrie added inline comments.Jun 3 2022, 2:47 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	okay. Although I believe it is not SLP vectorizer job to do constant folding.
5955	Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802, but it is not used. LIne 5811 will subtract one defined at 5795.

ABataev added inline comments.Jun 3 2022, 2:50 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5955	Ah, yes, sure.

vdmitrie added inline comments.Jun 3 2022, 3:14 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5960	Isn't this interface already puts assumption that a constant is a legal immediate? I was trying to explore this too and I found that it does not seem to cover correctly 32bit target specifically for 64bit operations. Ideally we should have interface that tells whether immediate is a legal imm operand for a target but I have not found anything like that. One way to figure this (which I found -may be wrongful) is when condition DL->getTypeStoreSizeInBits(ScalarTy) > DL->getLargestLegalIntTypeSizeInBits() is true we cannot assume operand as a legal immediate.

ABataev added inline comments.Jun 3 2022, 3:24 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5960	I'll check it.

ABataev added inline comments.Jun 6 2022, 9:02 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just add a new member function?

vdmitrie added inline comments.Jun 6 2022, 11:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5936	Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into inside the TTI interface does not look like right thing to do. But outlining this whole new code into a separate member is a good idea. What sounds weird for me is that constants may seed vtree for reduction. Although that is not directly related to this patch but you are placing here work arounds of that. IMO it is unpractical to run constants reduction through SLP vectorizer machinery. Probably, to make the work around of that issue simpler in this patch, add an early return: if (E->UserTreeIndices.empty()) return 0; Otherwise it will be returning memory-op cost for a foldable operation.

Address comments

Herald added subscribers: kbarton, nemanjai. · View Herald TranscriptJun 8 2022, 8:19 AM

ABataev added inline comments.Jun 8 2022, 8:20 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

5936

What sounds weird for me is that constants may seed vtree for reduction.

InstCombiner and other passes are not always able to handle them (or require some extra work and compile time). E.g:

define i32 @foo(i32 %v, i32 %a) {
  %s1 = add i32 %v, 1
  %s2 = add i32 %a, 2
  %s3 = add i32 %s1, %s2
  %s11 = add i32 %v, %a
  %s31 = add i32 %s3, %s11
  %s4 = add i32 %v, 3
  %s5 = add i32 %a, 4
  %s6 = add i32 %s4, %s5
  %s7 = add i32 %s31, %s6
  ret i32 %s7
}

SLP is able to transform it to:

define i32 @foo(i32 %v, i32 %a) {
  %1 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> <i32 4, i32 3, i32 2, i32 1>)
  %op.rdx = add i32 %a, %a
  %op.rdx1 = add i32 %a, %v
  %op.rdx2 = add i32 %v, %v
  %op.rdx3 = add i32 %op.rdx, %op.rdx1
  %op.rdx4 = add i32 %op.rdx3, %op.rdx2
  %op.rdx5 = add i32 %1, %op.rdx4
  ret i32 %op.rdx5
}

which can be optimized

%1 = i32 10

But I agree, that it requires improvement. We don't need to estimate the cost and emit reduction here. I have a patch that improves it. Need to work on it for some time, though.

Harbormaster completed remote builds in B168583: Diff 435177.Jun 8 2022, 10:16 AM

Rebase

Herald added a subscriber: nlopes. · View Herald TranscriptJun 14 2022, 1:41 PM

Harbormaster completed remote builds in B169816: Diff 436911.Jun 14 2022, 3:16 PM

RKSimon added inline comments.Jun 16 2022, 3:32 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3810	Yes, vector shifts must be splats without AVX2 or XOP
3830	Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops.

shchenz added a subscriber: shchenz.Jun 16 2022, 3:53 AM

ABataev added inline comments.Jun 16 2022, 10:17 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3830	I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector Add/Subs (0.2-0.33 vs ~0.5) We can add it later, currently no such kind of analysis in getIntImmCostInst

Address comments

Harbormaster completed remote builds in B170350: Diff 437675.Jun 16 2022, 2:32 PM

Ping!

RKSimon added inline comments.Jun 22 2022, 10:10 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3823	XOP can memory fold from Idx == 0 as well.

Removed getConstantBuildVectorCost, the analysis for constant values already exists in getArithmeticInstrCost. Added support for const operand for stores in getMemoryOpCost function.

Harbormaster completed remote builds in B171421: Diff 439156.Jun 22 2022, 3:09 PM

Rebase

Harbormaster completed remote builds in B171896: Diff 439821.Jun 24 2022, 11:47 AM

Rebase

Harbormaster completed remote builds in B172460: Diff 440594.Jun 28 2022, 7:13 AM

Rebase

Harbormaster completed remote builds in B180695: Diff 451880.Aug 11 2022, 10:35 AM

RKSimon added inline comments.Aug 15 2022, 4:15 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4108	Just do this once before the !VTy \|\| !LT.second.isVector()) check?

Address comment

Harbormaster completed remote builds in B181265: Diff 452648.Aug 15 2022, 7:30 AM

LGTM - it might be worth splitting the refactoring of adding the OperandValueKind arg to getMemoryOpCost? That way any fall out from the cost changes are more localised.

This revision is now accepted and ready to land.Aug 17 2022, 10:11 AM

In D126885#3729272, @RKSimon wrote:

LGTM - it might be worth splitting the refactoring of adding the OperandValueKind arg to getMemoryOpCost? That way any fall out from the cost changes are more localised.

Ok, will commit in a separate patch.

ABataev mentioned this in rGd53e245951f8: [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC..Aug 19 2022, 7:34 AM

This revision was landed with ongoing or failed builds.Aug 19 2022, 8:04 AM

Closed by commit rG0e7ed32c7136: [SLP]Cost for a constant buildvector. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG0e7ed32c7136: [SLP]Cost for a constant buildvector..

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.
Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

Also, to be clear, I've accepted that this patch is reasonable. I'm asking now about future direction for my own work, not asking for you to volunteer for any of the above. :)

In D126885#3735638, @reames wrote:

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.

Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

I would do both (in some way) as a first step. Introduce getConstBuildVectorInstrCost (local to RiscV TTI interfac) and use it in TTI functions (I mean in getArithmeticInstrCost, getMemoryOpCost, etc.) for better constant build vector cost estimation (if the user provides operands or OperandValueProperties). Later we can make it public for all TTI interfaces. Thoughts?

Also, to be clear, I've accepted that this patch is reasonable. I'm asking now about future direction for my own work, not asking for you to volunteer for any of the above. :)

I understand, no problem.

Refactoring OperandValueKind/Properties from enums into a single properties list has come up several times (IIRC KnownNeverZero/KnownNeverNegative properties and even KnownBits/SignBits/Min+Max have been mentioned as useful for some cases).

TBH just merging them as an initial cleanup (and improving TargetTransformInfo::getOperandInfo) would be worth it and would make it easier for future changes.

In D126885#3735695, @ABataev wrote:

In D126885#3735638, @reames wrote:

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.

Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

I would do both (in some way) as a first step. Introduce getConstBuildVectorInstrCost (local to RiscV TTI interfac) and use it in TTI functions (I mean in getArithmeticInstrCost, getMemoryOpCost, etc.) for better constant build vector cost estimation (if the user provides operands or OperandValueProperties). Later we can make it public for all TTI interfaces. Thoughts?

I split out the costing code in 59960e8d with plans to extend it. However, this isn't quite the same as getConstBuildVectorInstrCost as we don't have the actual values forming the build vector. For that, we'd need to significantly change the interface of TTI to pass through all of the Values making up the build vector.

In D126885#3735736, @RKSimon wrote:

Refactoring OperandValueKind/Properties from enums into a single properties list has come up several times (IIRC KnownNeverZero/KnownNeverNegative properties and even KnownBits/SignBits/Min+Max have been mentioned as useful for some cases).

TBH just merging them as an initial cleanup (and improving TargetTransformInfo::getOperandInfo) would be worth it and would make it easier for future changes.

Given this has come up many times, I went ahead and did it. Changes to wrap both existing properties enum in a class have been plumbed through all of TTI and client code. If we want to add new properties, it should be pretty straight forward to do so.

I'm still not sure this is the right direction overall - as opposed to costing the actual constant value - but I'm still toying with ideas here. One thing I did notice is that basically only X86 costs immediate operands with the current approach. So this really isn't "existing targets do X"; it's "most targets ignore this issue, and X86 does X".

FYI, this change causes regression with Flang due to store-forwarding issues. I am not sure if it is Flang-specific - please take a look: https://github.com/llvm/llvm-project/issues/57322

reames mentioned this in D132566: [SLP] Fix cost model w.r.t. operand properties.Aug 24 2022, 8:12 AM

reames mentioned this in D132680: [RISCV] Disable SLP vectorization by default due to unresolved profitability issues.Aug 25 2022, 10:48 AM

FYI this change caused a noticeable compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=31fbcccb3136b9da99e7bc95007e553403fcd641&to=0e7ed32c71362f3547329c6ee8573a8bc191f58a&stat=instructions Highest impact seems to be 7.5% on constants.c from mafft. I don't see anything obvious that can be optimized here though.

In D126885#3755038, @nikic wrote:

FYI this change caused a noticeable compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=31fbcccb3136b9da99e7bc95007e553403fcd641&to=0e7ed32c71362f3547329c6ee8573a8bc191f58a&stat=instructions Highest impact seems to be 7.5% on constants.c from mafft. I don't see anything obvious that can be optimized here though.

Hope D132750 will fix it in some cases for FP cases.

reames mentioned this in rG42ef5720493e: [SLP] Fix cost model w.r.t. operand properties.Sep 23 2022, 8:40 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVTargetTransformInfo.cpp

15 lines

X86/

X86TargetTransformInfo.cpp

19 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

6 lines

test/

Analysis/

CostModel/

X86/

arith-fp.ll

78 lines

Transforms/

SLPVectorizer/

RISCV/

rvv-min-vector-size.ll

15 lines

X86/

crash_bullet.ll

13 lines

Diff 454002

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

	Show First 20 Lines • Show All 434 Lines • ▼ Show 20 Lines
	}			}

	InstructionCost RISCVTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,			InstructionCost RISCVTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,
	MaybeAlign Alignment,			MaybeAlign Alignment,
	unsigned AddressSpace,			unsigned AddressSpace,
	TTI::TargetCostKind CostKind,			TTI::TargetCostKind CostKind,
	TTI::OperandValueKind OpdInfo,			TTI::OperandValueKind OpdInfo,
	const Instruction *I) {			const Instruction *I) {
	return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace, CostKind,			InstructionCost Cost = 0;
	OpdInfo, I);			if (Opcode == Instruction::Store && isa<VectorType>(Src) &&
				(OpdInfo == TTI::OK_UniformConstantValue \|\|
				OpdInfo == TTI::OK_NonUniformConstantValue)) {
				APInt PseudoAddr = APInt::getAllOnes(DL.getPointerSizeInBits());
				// Add a cost of address load + the cost of the vector load.
				Cost += RISCVMatInt::getIntMatCost(PseudoAddr, DL.getPointerSizeInBits(),
				getST()->getFeatureBits()) +
				getMemoryOpCost(Instruction::Load, Src, DL.getABITypeAlign(Src),
				/AddressSpace=/0, CostKind);
				}
				return Cost + BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
				CostKind, OpdInfo, I);
	}			}

	void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,			void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
	TTI::UnrollingPreferences &UP,			TTI::UnrollingPreferences &UP,
	OptimizationRemarkEmitter *ORE) {			OptimizationRemarkEmitter *ORE) {
	// TODO: More tuning on benchmarks and metrics with changes as needed			// TODO: More tuning on benchmarks and metrics with changes as needed
	// would apply to all settings below to enable performance.			// would apply to all settings below to enable performance.

	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,073 Lines • ▼ Show 20 Lines	if (LT.second.isVector() && (ISD == ISD::SDIV \|\| ISD == ISD::SREM \|\|
ISD == ISD::UDIV \|\| ISD == ISD::UREM)) {		ISD == ISD::UDIV \|\| ISD == ISD::UREM)) {
InstructionCost ScalarCost = getArithmeticInstrCost(		InstructionCost ScalarCost = getArithmeticInstrCost(
Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info,		Opcode, Ty->getScalarType(), CostKind, Op1Info, Op2Info,
TargetTransformInfo::OP_None, TargetTransformInfo::OP_None);		TargetTransformInfo::OP_None, TargetTransformInfo::OP_None);
return 20 * LT.first * LT.second.getVectorNumElements() * ScalarCost;		return 20 * LT.first * LT.second.getVectorNumElements() * ScalarCost;
}		}

// Fallback to the default implementation.		// Fallback to the default implementation.
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info);		return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info, Op2Info,
		Opd1PropInfo, Opd2PropInfo, Args, CxtI);
}		}

InstructionCost X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,		InstructionCost X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
VectorType *BaseTp,		VectorType *BaseTp,
ArrayRef<int> Mask, int Index,		ArrayRef<int> Mask, int Index,
VectorType *SubTp,		VectorType *SubTp,
ArrayRef<const Value *> Args) {		ArrayRef<const Value *> Args) {
// 64-bit packed float vectors (v2f32) are widened to type v4f32.		// 64-bit packed float vectors (v2f32) are widened to type v4f32.
▲ Show 20 Lines • Show All 2,710 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert,		bool Insert,
bool Extract) {		bool Extract) {
assert(DemandedElts.getBitWidth() ==		assert(DemandedElts.getBitWidth() ==
cast<FixedVectorType>(Ty)->getNumElements() &&		cast<FixedVectorType>(Ty)->getNumElements() &&
"Vector size mismatch");		"Vector size mismatch");

std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Ty);
MVT MScalarTy = LT.second.getScalarType();		MVT MScalarTy = LT.second.getScalarType();
		vdmitrieUnsubmitted Not Done Reply Inline Actions Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3 vdmitrie: Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3
		RKSimonUnsubmitted Not Done Reply Inline Actions Yes, vector shifts must be splats without AVX2 or XOP RKSimon: Yes, vector shifts must be splats without AVX2 or XOP
unsigned SizeInBits = LT.second.getSizeInBits();		unsigned SizeInBits = LT.second.getSizeInBits();

InstructionCost Cost = 0;		InstructionCost Cost = 0;

// For insertions, a ISD::BUILD_VECTOR style vector initialization can be much		// For insertions, a ISD::BUILD_VECTOR style vector initialization can be much
// cheaper than an accumulation of ISD::INSERT_VECTOR_ELT.		// cheaper than an accumulation of ISD::INSERT_VECTOR_ELT.
if (Insert) {		if (Insert) {
if ((MScalarTy == MVT::i16 && ST->hasSSE2()) \|\|		if ((MScalarTy == MVT::i16 && ST->hasSSE2()) \|\|
(MScalarTy.isInteger() && ST->hasSSE41()) \|\|		(MScalarTy.isInteger() && ST->hasSSE41()) \|\|
(MScalarTy == MVT::f32 && ST->hasSSE41())) {		(MScalarTy == MVT::f32 && ST->hasSSE41())) {
// For types we can insert directly, insertion into 128-bit sub vectors is		// For types we can insert directly, insertion into 128-bit sub vectors is
// cheap, followed by a cheap chain of concatenations.		// cheap, followed by a cheap chain of concatenations.
if (SizeInBits <= 128) {		if (SizeInBits <= 128) {
		RKSimonUnsubmitted Not Done Reply Inline Actions XOP can memory fold from Idx == 0 as well. RKSimon: XOP can memory fold from Idx == 0 as well.
Cost +=		Cost +=
BaseT::getScalarizationOverhead(Ty, DemandedElts, Insert, false);		BaseT::getScalarizationOverhead(Ty, DemandedElts, Insert, false);
} else {		} else {
// In each 128-lane, if at least one index is demanded but not all		// In each 128-lane, if at least one index is demanded but not all
// indices are demanded and this 128-lane is not the first 128-lane of		// indices are demanded and this 128-lane is not the first 128-lane of
// the legalized-vector, then this 128-lane needs a extracti128; If in		// the legalized-vector, then this 128-lane needs a extracti128; If in
// each 128-lane, there is at least one demanded index, this 128-lane		// each 128-lane, there is at least one demanded index, this 128-lane
		RKSimonUnsubmitted Not Done Reply Inline Actions Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops. RKSimon: Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops.
		ABataevAuthorUnsubmitted Done Reply Inline Actions I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector Add/Subs (0.2-0.33 vs ~0.5) We can add it later, currently no such kind of analysis in getIntImmCostInst ABataev: 1. I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector…
// needs a inserti128.		// needs a inserti128.

// The following cases will help you build a better understanding:		// The following cases will help you build a better understanding:
// Assume we insert several elements into a v8i32 vector in avx2,		// Assume we insert several elements into a v8i32 vector in avx2,
// Case#1: inserting into 1th index needs vpinsrd + inserti128.		// Case#1: inserting into 1th index needs vpinsrd + inserti128.
// Case#2: inserting into 5th index needs extracti128 + vpinsrd +		// Case#2: inserting into 5th index needs extracti128 + vpinsrd +
// inserti128.		// inserti128.
// Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.		// Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	if (TLI->getValueType(DL, Src, true) == MVT::Other)
return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,		return BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace,
CostKind);		CostKind);

// Legalize the type.		// Legalize the type.
std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);		std::pair<InstructionCost, MVT> LT = getTypeLegalizationCost(Src);

auto *VTy = dyn_cast<FixedVectorType>(Src);		auto *VTy = dyn_cast<FixedVectorType>(Src);

		InstructionCost Cost = 0;

		// Add a cost for constant load to vector.
		if (Opcode == Instruction::Store &&
		(OpdInfo == TTI::OK_UniformConstantValue \|\|
		OpdInfo == TTI::OK_NonUniformConstantValue))
		Cost += getMemoryOpCost(Instruction::Load, Src, DL.getABITypeAlign(Src),
		/AddressSpace=/0, CostKind);

// Handle the simple case of non-vectors.		// Handle the simple case of non-vectors.
// NOTE: this assumes that legalization never creates vector from scalars!		// NOTE: this assumes that legalization never creates vector from scalars!
if (!VTy \|\| !LT.second.isVector())		if (!VTy \|\| !LT.second.isVector()) {
// Each load/store unit costs 1.		// Each load/store unit costs 1.
return LT.first * 1;		return (LT.second.isFloatingPoint() ? Cost : 0) + LT.first * 1;
		}

bool IsLoad = Opcode == Instruction::Load;		bool IsLoad = Opcode == Instruction::Load;

Type *EltTy = VTy->getElementType();		Type *EltTy = VTy->getElementType();

const int EltTyBits = DL.getTypeSizeInBits(EltTy);		const int EltTyBits = DL.getTypeSizeInBits(EltTy);

InstructionCost Cost = 0;

// Source of truth: how many elements were there in the original IR vector?		// Source of truth: how many elements were there in the original IR vector?
const unsigned SrcNumElt = VTy->getNumElements();		const unsigned SrcNumElt = VTy->getNumElements();

// How far have we gotten?		// How far have we gotten?
int NumEltRemaining = SrcNumElt;		int NumEltRemaining = SrcNumElt;
// Note that we intentionally capture by-reference, NumEltRemaining changes.		// Note that we intentionally capture by-reference, NumEltRemaining changes.
		RKSimonUnsubmitted Not Done Reply Inline Actions Just do this once before the !VTy \|\| !LT.second.isVector()) check? RKSimon: Just do this once before the !VTy \|\| !LT.second.isVector()) check?
auto NumEltDone = [&]() { return SrcNumElt - NumEltRemaining; };		auto NumEltDone = [&]() { return SrcNumElt - NumEltRemaining; };

const int MaxLegalOpSizeBytes = divideCeil(LT.second.getSizeInBits(), 8);		const int MaxLegalOpSizeBytes = divideCeil(LT.second.getSizeInBits(), 8);

// Note that even if we can store 64 bits of an XMM, we still operate on XMM.		// Note that even if we can store 64 bits of an XMM, we still operate on XMM.
const unsigned XMMBits = 128;		const unsigned XMMBits = 128;
if (XMMBits % EltTyBits != 0)		if (XMMBits % EltTyBits != 0)
// Vector size must be a multiple of the element size. I.e. no padding.		// Vector size must be a multiple of the element size. I.e. no padding.
▲ Show 20 Lines • Show All 1,924 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,923 Lines • ▼ Show 20 Lines	if (E->State == TreeEntry::NeedToGather) {
if (Shuffle) {		if (Shuffle) {
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
if (ShuffleVectorInst::isIdentityMask(Mask)) {		if (ShuffleVectorInst::isIdentityMask(Mask)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.		// node. Cost is 0.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "		<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");		<< *VL.front() << ".\n");
		vdmitrieUnsubmitted Not Done Reply Inline Actions This estimate should be bit more complicated. Here are the things that can additionally be considered: for scalar floating point ops a constant operand is normally loaded from memory too. if it is an operand of instruction that becomes immediate (like shift value) and is splat - cost is zero. for a scalar integer op a constant operand is typically an immediate, so this estimate works in most cases but there is an exception: 64 bits operations on a 32bits target. That should be taken into account too. vdmitrie: This estimate should be bit more complicated. Here are the things that can additionally be…
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
GatherCost =		GatherCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);		FinalVecTy, E->ReuseShuffleIndices);
		vdmitrieUnsubmitted Not Done Reply Inline Actions Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root only but constants do not seed vtree. if alternate opcodes are for shl/shr but shift value is splat it is still can be immediate for both of them. vdmitrie: Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If constants are reduced values in reduction ops. That's why there is a TODO above. ABataev: 1. If constants are reduced values in reduction ops. 2. That's why there is a TODO above.
		vdmitrieUnsubmitted Not Done Reply Inline Actions okay. Although I believe it is not SLP vectorizer job to do constant folding. vdmitrie: okay. Although I believe it is not SLP vectorizer job to do constant folding.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just add a new member function? ABataev: Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into inside the TTI interface does not look like right thing to do. But outlining this whole new code into a separate member is a good idea. What sounds weird for me is that constants may seed vtree for reduction. Although that is not directly related to this patch but you are placing here work arounds of that. IMO it is unpractical to run constants reduction through SLP vectorizer machinery. Probably, to make the work around of that issue simpler in this patch, add an early return: if (E->UserTreeIndices.empty()) return 0; Otherwise it will be returning memory-op cost for a foldable operation. vdmitrie: Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into…
		ABataevAuthorUnsubmitted Done Reply Inline Actions What sounds weird for me is that constants may seed vtree for reduction. InstCombiner and other passes are not always able to handle them (or require some extra work and compile time). E.g: define i32 @foo(i32 %v, i32 %a) { %s1 = add i32 %v, 1 %s2 = add i32 %a, 2 %s3 = add i32 %s1, %s2 %s11 = add i32 %v, %a %s31 = add i32 %s3, %s11 %s4 = add i32 %v, 3 %s5 = add i32 %a, 4 %s6 = add i32 %s4, %s5 %s7 = add i32 %s31, %s6 ret i32 %s7 } SLP is able to transform it to: define i32 @foo(i32 %v, i32 %a) { %1 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> <i32 4, i32 3, i32 2, i32 1>) %op.rdx = add i32 %a, %a %op.rdx1 = add i32 %a, %v %op.rdx2 = add i32 %v, %v %op.rdx3 = add i32 %op.rdx, %op.rdx1 %op.rdx4 = add i32 %op.rdx3, %op.rdx2 %op.rdx5 = add i32 %1, %op.rdx4 ret i32 %op.rdx5 } which can be optimized %1 = i32 10 But I agree, that it requires improvement. We don't need to estimate the cost and emit reduction here. I have a patch that improves it. Need to work on it for some time, though. ABataev: > What sounds weird for me is that constants may seed vtree for reduction. InstCombiner and…
} else {		} else {
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()		LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
<< " entries for bundle that starts with "		<< " entries for bundle that starts with "
<< *VL.front() << ".\n");		<< *VL.front() << ".\n");
// Detected that instead of gather we can emit a shuffle of single/two		// Detected that instead of gather we can emit a shuffle of single/two
// previously vectorized nodes. Add the cost of the permutation rather		// previously vectorized nodes. Add the cost of the permutation rather
// than gather.		// than gather.
::addMask(Mask, E->ReuseShuffleIndices);		::addMask(Mask, E->ReuseShuffleIndices);
GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);		GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);
}		}
return GatherCost;		return GatherCost;
}		}
if ((E->getOpcode() == Instruction::ExtractElement \|\|		if ((E->getOpcode() == Instruction::ExtractElement \|\|
all_of(E->Scalars,		all_of(E->Scalars,
[](Value *V) {		[](Value *V) {
return isa<ExtractElementInst, UndefValue>(V);		return isa<ExtractElementInst, UndefValue>(V);
})) &&		})) &&
allSameType(VL)) {		allSameType(VL)) {
// Check that gather of extractelements can be represented as just a		// Check that gather of extractelements can be represented as just a
		vdmitrieUnsubmitted Not Done Reply Inline Actions drop it? vdmitrie: drop it?
		ABataevAuthorUnsubmitted Done Reply Inline Actions What do you mean? ABataev: What do you mean?
		vdmitrieUnsubmitted Not Done Reply Inline Actions Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802, but it is not used. LIne 5811 will subtract one defined at 5795. vdmitrie: Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah, yes, sure. ABataev: Ah, yes, sure.
// shuffle of a single/two vectors the scalars are extracted from.		// shuffle of a single/two vectors the scalars are extracted from.
SmallVector<int> Mask;		SmallVector<int> Mask;
Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =		Optional<TargetTransformInfo::ShuffleKind> ShuffleKind =
isFixedVectorShuffle(VL, Mask);		isFixedVectorShuffle(VL, Mask);
if (ShuffleKind) {		if (ShuffleKind) {
		vdmitrieUnsubmitted Not Done Reply Inline Actions Isn't this interface already puts assumption that a constant is a legal immediate? I was trying to explore this too and I found that it does not seem to cover correctly 32bit target specifically for 64bit operations. Ideally we should have interface that tells whether immediate is a legal imm operand for a target but I have not found anything like that. One way to figure this (which I found -may be wrongful) is when condition DL->getTypeStoreSizeInBits(ScalarTy) > DL->getLargestLegalIntTypeSizeInBits() is true we cannot assume operand as a legal immediate. vdmitrie: Isn't this interface already puts assumption that a constant is a legal immediate? I was trying…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll check it. ABataev: I'll check it.
// Found the bunch of extractelement instructions that must be gathered		// Found the bunch of extractelement instructions that must be gathered
// into a vector and can be represented as a permutation elements in a		// into a vector and can be represented as a permutation elements in a
// single input vector or of 2 input vectors.		// single input vector or of 2 input vectors.
InstructionCost Cost =		InstructionCost Cost =
computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);		computeExtractCost(VL, VecTy, ShuffleKind, Mask, TTI);
AdjustExtractsCost(Cost);		AdjustExtractsCost(Cost);
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		Cost += TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
▲ Show 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
SmallVector<const Value *, 4> Operands(VL0->operand_values());		SmallVector<const Value *, 4> Operands(VL0->operand_values());
InstructionCost ScalarEltCost =		InstructionCost ScalarEltCost =
TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), ScalarTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
if (NeedToShuffleReuses) {		if (NeedToShuffleReuses) {
CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;		CommonCost -= (EntryVF - VL.size()) * ScalarEltCost;
}		}
InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;		InstructionCost ScalarCost = VecTy->getNumElements() * ScalarEltCost;
		for (unsigned I = 0, Num = VL0->getNumOperands(); I < Num; ++I) {
		if (all_of(VL, [I](Value *V) {
		return isConstant(cast<Instruction>(V)->getOperand(I));
		}))
		Operands[I] = ConstantVector::getNullValue(VecTy);
		}
InstructionCost VecCost =		InstructionCost VecCost =
TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,		TTI->getArithmeticInstrCost(E->getOpcode(), VecTy, CostKind, Op1VK,
Op2VK, Op1VP, Op2VP, Operands, VL0);		Op2VK, Op1VP, Op2VP, Operands, VL0);
LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));		LLVM_DEBUG(dumpTreeCosts(E, CommonCost, VecCost, ScalarCost));
return CommonCost + VecCost - ScalarCost;		return CommonCost + VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
▲ Show 20 Lines • Show All 6,067 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/X86/arith-fp.ll

Show First 20 Lines • Show All 620 Lines • ▼ Show 20 Lines	;
%V8F64 = fdiv <8 x double> undef, undef		%V8F64 = fdiv <8 x double> undef, undef

ret i32 undef		ret i32 undef
}		}

define i32 @frem(i32 %arg) {		define i32 @frem(i32 %arg) {
; SSE1-LABEL: 'frem'		; SSE1-LABEL: 'frem'
; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = frem <2 x double> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %V2F64 = frem <2 x double> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4F64 = frem <4 x double> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4F64 = frem <4 x double> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8F64 = frem <8 x double> undef, undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8F64 = frem <8 x double> undef, undef
; SSE1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE1-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; SSE2-LABEL: 'frem'		; SSE2-LABEL: 'frem'
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE2-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; SSE42-LABEL: 'frem'		; SSE42-LABEL: 'frem'
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SSE42-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; AVX-LABEL: 'frem'		; AVX-LABEL: 'frem'
; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V8F32 = frem <8 x float> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F32 = frem <8 x float> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 62 for instruction: %V16F32 = frem <16 x float> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V16F32 = frem <16 x float> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = frem <4 x double> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F64 = frem <4 x double> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F64 = frem <8 x double> undef, undef		; AVX-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F64 = frem <8 x double> undef, undef
; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; AVX-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; AVX512-LABEL: 'frem'		; AVX512-LABEL: 'frem'
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 31 for instruction: %V8F32 = frem <8 x float> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F32 = frem <8 x float> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 63 for instruction: %V16F32 = frem <16 x float> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V16F32 = frem <16 x float> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F64 = frem <4 x double> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F64 = frem <4 x double> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V8F64 = frem <8 x double> undef, undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %V8F64 = frem <8 x double> undef, undef
; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; AVX512-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; SLM-LABEL: 'frem'		; SLM-LABEL: 'frem'
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; SLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; SLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
; GLM-LABEL: 'frem'		; GLM-LABEL: 'frem'
; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 14 for instruction: %V4F32 = frem <4 x float> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V4F32 = frem <4 x float> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 28 for instruction: %V8F32 = frem <8 x float> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V8F32 = frem <8 x float> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 56 for instruction: %V16F32 = frem <16 x float> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V16F32 = frem <16 x float> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V2F64 = frem <2 x double> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V2F64 = frem <2 x double> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V4F64 = frem <4 x double> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <4 x double> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V8F64 = frem <8 x double> undef, undef		; GLM-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V8F64 = frem <8 x double> undef, undef
; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef		; GLM-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;		;
%F32 = frem float undef, undef		%F32 = frem float undef, undef
%V4F32 = frem <4 x float> undef, undef		%V4F32 = frem <4 x float> undef, undef
%V8F32 = frem <8 x float> undef, undef		%V8F32 = frem <8 x float> undef, undef
%V16F32 = frem <16 x float> undef, undef		%V16F32 = frem <16 x float> undef, undef

%F64 = frem double undef, undef		%F64 = frem double undef, undef
▲ Show 20 Lines • Show All 421 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/RISCV/rvv-min-vector-size.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=128 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-128		; RUN: -riscv-v-vector-bits-min=128 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-128
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=256 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-256		; RUN: -riscv-v-vector-bits-min=256 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-256
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=512 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-512		; RUN: -riscv-v-vector-bits-min=512 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-512

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"		target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
target triple = "riscv64"		target triple = "riscv64"

define void @foo(i64* nocapture writeonly %da) {		define void @foo(i64* nocapture writeonly %da) {
; CHECK-128-LABEL: @foo(		; CHECK-128-LABEL: @foo(
; CHECK-128-NEXT: entry:		; CHECK-128-NEXT: entry:
; CHECK-128-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <2 x i64>		; CHECK-128-NEXT: store i64 0, i64* [[DA:%.*]], align 8
; CHECK-128-NEXT: store <2 x i64> <i64 0, i64 1>, <2 x i64>* [[TMP0]], align 8		; CHECK-128-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 1
		; CHECK-128-NEXT: store i64 1, i64* [[ARRAYIDX1]], align 8
; CHECK-128-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 2		; CHECK-128-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 2
; CHECK-128-NEXT: [[TMP1:%.]] = bitcast i64 [[ARRAYIDX2]] to <2 x i64>*		; CHECK-128-NEXT: store i64 2, i64* [[ARRAYIDX2]], align 8
; CHECK-128-NEXT: store <2 x i64> <i64 2, i64 3>, <2 x i64>* [[TMP1]], align 8		; CHECK-128-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 3
		; CHECK-128-NEXT: store i64 3, i64* [[ARRAYIDX3]], align 8
; CHECK-128-NEXT: ret void		; CHECK-128-NEXT: ret void
;		;
; CHECK-256-LABEL: @foo(		; CHECK-256-LABEL: @foo(
; CHECK-256-NEXT: entry:		; CHECK-256-NEXT: entry:
; CHECK-256-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <4 x i64>		; CHECK-256-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <4 x i64>
; CHECK-256-NEXT: store <4 x i64> <i64 0, i64 1, i64 2, i64 3>, <4 x i64>* [[TMP0]], align 8		; CHECK-256-NEXT: store <4 x i64> <i64 0, i64 1, i64 2, i64 3>, <4 x i64>* [[TMP0]], align 8
; CHECK-256-NEXT: ret void		; CHECK-256-NEXT: ret void
;		;
Show All 12 Lines	entry:
%arrayidx3 = getelementptr inbounds i64, i64* %da, i64 3		%arrayidx3 = getelementptr inbounds i64, i64* %da, i64 3
store i64 3, i64* %arrayidx3, align 8		store i64 3, i64* %arrayidx3, align 8
ret void		ret void
}		}

define void @foo8(i8* nocapture writeonly %da) {		define void @foo8(i8* nocapture writeonly %da) {
; CHECK-LABEL: @foo8(		; CHECK-LABEL: @foo8(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[DA:%.]] to <2 x i8>		; CHECK-NEXT: store i8 0, i8* [[DA:%.*]], align 8
; CHECK-NEXT: store <2 x i8> <i8 0, i8 1>, <2 x i8>* [[TMP0]], align 8		; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 1
		; CHECK-NEXT: store i8 1, i8* [[ARRAYIDX1]], align 8
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 2		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
store i8 0, i8* %da, align 8		store i8 0, i8* %da, align 8
%arrayidx1 = getelementptr inbounds i8, i8* %da, i8 1		%arrayidx1 = getelementptr inbounds i8, i8* %da, i8 1
store i8 1, i8* %arrayidx1, align 8		store i8 1, i8* %arrayidx1, align 8
%arrayidx2 = getelementptr inbounds i8, i8* %da, i8 2		%arrayidx2 = getelementptr inbounds i8, i8* %da, i8 2
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }			%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }

	define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {			define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {
	; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(			; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0			; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0
				; CHECK-NEXT: [[NUB5:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO]], i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]
	; CHECK: land.lhs.true.i.1:			; CHECK: land.lhs.true.i.1:
	; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]			; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]
	; CHECK: if.then7.1:			; CHECK: if.then7.1:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*			; CHECK-NEXT: store i32 1, i32* [[M_NUMCONSTRAINTROWS4]], align 4
	; CHECK-NEXT: store <2 x i32> <i32 1, i32 5>, <2 x i32>* [[TMP0]], align 4			; CHECK-NEXT: store i32 5, i32* [[NUB5]], align 4
	; CHECK-NEXT: br label [[FOR_INC_1]]			; CHECK-NEXT: br label [[FOR_INC_1]]
	; CHECK: for.inc.1:			; CHECK: for.inc.1:
	; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 1, i32 -1>			; CHECK-NEXT: [[TMP1:%.*]] = add nsw <2 x i32> [[TMP0]], <i32 1, i32 -1>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP1]], <2 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br i1 undef, label %if.else, label %if.then			br i1 undef, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	ret void			ret void

	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Cost for a constant buildvector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 454002

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Analysis/CostModel/X86/arith-fp.ll

llvm/test/Transforms/SLPVectorizer/RISCV/rvv-min-vector-size.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

[SLP]Cost for a constant buildvector.
ClosedPublic