This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Cost for a constant buildvector.
ClosedPublic

Authored by ABataev on Jun 2 2022, 7:56 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper

Commits

rG0e7ed32c7136: [SLP]Cost for a constant buildvector.

Summary

Usually, constant buildvector results in a vector load from a
constant/data pool.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Jun 2 2022, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:56 AM

Herald added subscribers: vporpo, StephenFan, frasercrmck and 21 others. · View Herald Transcript

ABataev requested review of this revision.Jun 2 2022, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:56 AM

Herald added subscribers: • pcwang-thead, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B167520: Diff 433753.Jun 2 2022, 8:33 AM

Some unfortunate regressions

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	Regression https://github.com/llvm/llvm-project/issues/46327

vdmitrie added a subscriber: vdmitrie.Jun 2 2022, 8:49 AM

vdmitrie added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5810	This estimate should be bit more complicated. Here are the things that can additionally be considered: for scalar floating point ops a constant operand is normally loaded from memory too. if it is an operand of instruction that becomes immediate (like shift value) and is splat - cost is zero. for a scalar integer op a constant operand is typically an immediate, so this estimate works in most cases but there is an exception: 64 bits operations on a 32bits target. That should be taken into account too.

Address comments.

Herald added subscribers: jsji, luke957, pengfei, arichardson. · View Herald TranscriptJun 3 2022, 12:46 PM

ABataev added inline comments.Jun 3 2022, 12:51 PM

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	llvm-mca reports throughputs: For scalar code - 5.5 AVX vector - 8.0 AVX2 vector - 5.0 https://godbolt.org/z/rEc74dxza

xbolva00 added inline comments.Jun 3 2022, 1:03 PM

llvm/test/Transforms/SLPVectorizer/X86/pr46983.ll
146 ↗	(On Diff #433753)	Ah, right, these checks are for avx.

Harbormaster completed remote builds in B167776: Diff 434112.Jun 3 2022, 1:57 PM

icost.ll1 KBDownload

Test case for collection.

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3787	Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3

In D126885#3557032, @vdmitrie wrote:

icost.ll1 KBDownload

Test case for collection.

mca shows that these 2 instructions has the same cost, so it actually doers not matter. Probably worth to add some other instructions, which can load params directly from memory for x86

vdmitrie added inline comments.Jun 3 2022, 2:23 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814	Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root only but constants do not seed vtree. if alternate opcodes are for shl/shr but shift value is splat it is still can be immediate for both of them.
5833	drop it?

ABataev added inline comments.Jun 3 2022, 2:42 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814	If constants are reduced values in reduction ops. That's why there is a TODO above.
5833	What do you mean?

vdmitrie added inline comments.Jun 3 2022, 2:47 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814	okay. Although I believe it is not SLP vectorizer job to do constant folding.
5833	Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802, but it is not used. LIne 5811 will subtract one defined at 5795.

ABataev added inline comments.Jun 3 2022, 2:50 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5833	Ah, yes, sure.

vdmitrie added inline comments.Jun 3 2022, 3:14 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5838	Isn't this interface already puts assumption that a constant is a legal immediate? I was trying to explore this too and I found that it does not seem to cover correctly 32bit target specifically for 64bit operations. Ideally we should have interface that tells whether immediate is a legal imm operand for a target but I have not found anything like that. One way to figure this (which I found -may be wrongful) is when condition DL->getTypeStoreSizeInBits(ScalarTy) > DL->getLargestLegalIntTypeSizeInBits() is true we cannot assume operand as a legal immediate.

ABataev added inline comments.Jun 3 2022, 3:24 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5838	I'll check it.

ABataev added inline comments.Jun 6 2022, 9:02 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814	Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just add a new member function?

vdmitrie added inline comments.Jun 6 2022, 11:34 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
5814	Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into inside the TTI interface does not look like right thing to do. But outlining this whole new code into a separate member is a good idea. What sounds weird for me is that constants may seed vtree for reduction. Although that is not directly related to this patch but you are placing here work arounds of that. IMO it is unpractical to run constants reduction through SLP vectorizer machinery. Probably, to make the work around of that issue simpler in this patch, add an early return: if (E->UserTreeIndices.empty()) return 0; Otherwise it will be returning memory-op cost for a foldable operation.

Address comments

Herald added subscribers: kbarton, nemanjai. · View Herald TranscriptJun 8 2022, 8:19 AM

ABataev added inline comments.Jun 8 2022, 8:20 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

5814

What sounds weird for me is that constants may seed vtree for reduction.

InstCombiner and other passes are not always able to handle them (or require some extra work and compile time). E.g:

define i32 @foo(i32 %v, i32 %a) {
  %s1 = add i32 %v, 1
  %s2 = add i32 %a, 2
  %s3 = add i32 %s1, %s2
  %s11 = add i32 %v, %a
  %s31 = add i32 %s3, %s11
  %s4 = add i32 %v, 3
  %s5 = add i32 %a, 4
  %s6 = add i32 %s4, %s5
  %s7 = add i32 %s31, %s6
  ret i32 %s7
}

SLP is able to transform it to:

define i32 @foo(i32 %v, i32 %a) {
  %1 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> <i32 4, i32 3, i32 2, i32 1>)
  %op.rdx = add i32 %a, %a
  %op.rdx1 = add i32 %a, %v
  %op.rdx2 = add i32 %v, %v
  %op.rdx3 = add i32 %op.rdx, %op.rdx1
  %op.rdx4 = add i32 %op.rdx3, %op.rdx2
  %op.rdx5 = add i32 %1, %op.rdx4
  ret i32 %op.rdx5
}

which can be optimized

%1 = i32 10

But I agree, that it requires improvement. We don't need to estimate the cost and emit reduction here. I have a patch that improves it. Need to work on it for some time, though.

Harbormaster completed remote builds in B168583: Diff 435177.Jun 8 2022, 10:16 AM

Rebase

Herald added a subscriber: nlopes. · View Herald TranscriptJun 14 2022, 1:41 PM

Harbormaster completed remote builds in B169816: Diff 436911.Jun 14 2022, 3:16 PM

RKSimon added inline comments.Jun 16 2022, 3:32 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3787	Yes, vector shifts must be splats without AVX2 or XOP
3807	Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops.

shchenz added a subscriber: shchenz.Jun 16 2022, 3:53 AM

ABataev added inline comments.Jun 16 2022, 10:17 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3807	I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector Add/Subs (0.2-0.33 vs ~0.5) We can add it later, currently no such kind of analysis in getIntImmCostInst

Address comments

Harbormaster completed remote builds in B170350: Diff 437675.Jun 16 2022, 2:32 PM

Ping!

RKSimon added inline comments.Jun 22 2022, 10:10 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
3800	XOP can memory fold from Idx == 0 as well.

Removed getConstantBuildVectorCost, the analysis for constant values already exists in getArithmeticInstrCost. Added support for const operand for stores in getMemoryOpCost function.

Harbormaster completed remote builds in B171421: Diff 439156.Jun 22 2022, 3:09 PM

Rebase

Harbormaster completed remote builds in B171896: Diff 439821.Jun 24 2022, 11:47 AM

Rebase

Harbormaster completed remote builds in B172460: Diff 440594.Jun 28 2022, 7:13 AM

Rebase

Harbormaster completed remote builds in B180695: Diff 451880.Aug 11 2022, 10:35 AM

RKSimon added inline comments.Aug 15 2022, 4:15 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
4115	Just do this once before the !VTy \|\| !LT.second.isVector()) check?

Address comment

Harbormaster completed remote builds in B181265: Diff 452648.Aug 15 2022, 7:30 AM

LGTM - it might be worth splitting the refactoring of adding the OperandValueKind arg to getMemoryOpCost? That way any fall out from the cost changes are more localised.

This revision is now accepted and ready to land.Aug 17 2022, 10:11 AM

In D126885#3729272, @RKSimon wrote:

LGTM - it might be worth splitting the refactoring of adding the OperandValueKind arg to getMemoryOpCost? That way any fall out from the cost changes are more localised.

Ok, will commit in a separate patch.

ABataev mentioned this in rGd53e245951f8: [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC..Aug 19 2022, 7:34 AM

This revision was landed with ongoing or failed builds.Aug 19 2022, 8:04 AM

Closed by commit rG0e7ed32c7136: [SLP]Cost for a constant buildvector. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG0e7ed32c7136: [SLP]Cost for a constant buildvector..

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.
Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

Also, to be clear, I've accepted that this patch is reasonable. I'm asking now about future direction for my own work, not asking for you to volunteer for any of the above. :)

In D126885#3735638, @reames wrote:

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.

Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

I would do both (in some way) as a first step. Introduce getConstBuildVectorInstrCost (local to RiscV TTI interfac) and use it in TTI functions (I mean in getArithmeticInstrCost, getMemoryOpCost, etc.) for better constant build vector cost estimation (if the user provides operands or OperandValueProperties). Later we can make it public for all TTI interfaces. Thoughts?

Also, to be clear, I've accepted that this patch is reasonable. I'm asking now about future direction for my own work, not asking for you to volunteer for any of the above. :)

I understand, no problem.

Refactoring OperandValueKind/Properties from enums into a single properties list has come up several times (IIRC KnownNeverZero/KnownNeverNegative properties and even KnownBits/SignBits/Min+Max have been mentioned as useful for some cases).

TBH just merging them as an initial cleanup (and improving TargetTransformInfo::getOperandInfo) would be worth it and would make it easier for future changes.

In D126885#3735695, @ABataev wrote:

In D126885#3735638, @reames wrote:

In D126885#3735500, @ABataev wrote:

In D126885#3735457, @reames wrote:

Coming into this a bit late.

I stumbled into this myself when looking at the impact of SLP on RISC-V. I think this is addressing an important problem, but I'm really not happy with the structure of the change that landed.

We have a general problem here of needing to account for cost of a constant build vector. This change ended up being specific to stores of constant build vectors, but the same basic problem still exists if e.g. you have a load, add constant-build-vector, and store sequence which gets vectorized. The problem here is not in any way related to the cost of the store; it's related to the cost of materializing the value to be stored.

There's an additional problem that the cost model added for RISC-V is way overly simplistic. It's out of sync with the existing build vector lowering code, and thus will result in costs which differ from the actual lowering chosen. More importantly, the interface chosen in this patch prevents a more sophisticated cost model from being used.

I think we need to undo this, and return to the getConstBuildVectorInstrCost approach used in early versions of this patch. There was a mention of existing build vector costing in getConstBuildVectorInstrCost, but I can't find this in generic code. Can you point me to the code you were referring to?

Check the cost model of arithmetic instructions etc, they already include the cost analysis for constant values.

I think I found the code you're referring to in X86TTIImpl::getArithmeticInstrCost. I'd summarize this code as we have various alternate cost tables which seem to assume one constant splat operand or sometimes just one constant operand gets folded into the instruction. I don't know enough about avx512 instruction encoding to reason about this, but I will accept that it exists. Though it does look very weird to me that *all* constants are assumed to be folded into the encoding? Whatever, out of scope for this discussion.

p.s. I used the word "undo" specifically to avoid "revert". I'm not asking the change be reverted, simply that we work in the direction of a better interface overall. Doing that will have the effect of semantically reverting the landed change, but I'm not picky about the order of operations here.

I tried initially to implement it but our cost model already includes the cost of constants/constant buildvectors for many operations. It requires significant rework of TTI and some extra investigation because we need to account cross dependency between constants and the operations. And I'm not sure if it would better/easier to implement, it requires some extra (re)design investigation.

Ok, so I see the concern here. I'm not thrilled with the conclusion, but I think I agree that the current state of the art is having each operation reason about the cost of the constant materialization independently.

Given that, I see why you took the approach you did here.

However, we're still left with the problem that the current interface is insufficient for RISC-V. On the vector side, we can generate various non-splat sequences (e.g. vid and friends) at low cost. As such, the current expressibility of interface isn't really sufficient.

I see two possible paths, both with downsides. I'm curious what you think:

Extend the OperandValueProperties enum with a bunch more options for describing build vectors. I don't really see the semantic distinction between OperandValueProperties and OperandValueKind, so we'd probably end up merging them into a single info struct with a bunch more properties on it. This arguably works more naturally with scalable vectors, but it's a bunch of complexity.

Add the getConstBuildVectorInstrCost interface anyways. Document the contract as being to return zero cost when the constant could fold into the using instruction. Existing backends which don't need the additional expressibility continue with the old scheme, RISC-V uses this approach to cost build vectors instead (i.e. arithmetic cost et al don't include constant mat costs).

As I said, both approaches have some obvious downsides. If you have an alternate idea, definitely open to hearing it.

I would do both (in some way) as a first step. Introduce getConstBuildVectorInstrCost (local to RiscV TTI interfac) and use it in TTI functions (I mean in getArithmeticInstrCost, getMemoryOpCost, etc.) for better constant build vector cost estimation (if the user provides operands or OperandValueProperties). Later we can make it public for all TTI interfaces. Thoughts?

I split out the costing code in 59960e8d with plans to extend it. However, this isn't quite the same as getConstBuildVectorInstrCost as we don't have the actual values forming the build vector. For that, we'd need to significantly change the interface of TTI to pass through all of the Values making up the build vector.

In D126885#3735736, @RKSimon wrote:

Refactoring OperandValueKind/Properties from enums into a single properties list has come up several times (IIRC KnownNeverZero/KnownNeverNegative properties and even KnownBits/SignBits/Min+Max have been mentioned as useful for some cases).

TBH just merging them as an initial cleanup (and improving TargetTransformInfo::getOperandInfo) would be worth it and would make it easier for future changes.

Given this has come up many times, I went ahead and did it. Changes to wrap both existing properties enum in a class have been plumbed through all of TTI and client code. If we want to add new properties, it should be pretty straight forward to do so.

I'm still not sure this is the right direction overall - as opposed to costing the actual constant value - but I'm still toying with ideas here. One thing I did notice is that basically only X86 costs immediate operands with the current approach. So this really isn't "existing targets do X"; it's "most targets ignore this issue, and X86 does X".

FYI, this change causes regression with Flang due to store-forwarding issues. I am not sure if it is Flang-specific - please take a look: https://github.com/llvm/llvm-project/issues/57322

reames mentioned this in D132566: [SLP] Fix cost model w.r.t. operand properties.Aug 24 2022, 8:12 AM

reames mentioned this in D132680: [RISCV] Disable SLP vectorization by default due to unresolved profitability issues.Aug 25 2022, 10:48 AM

FYI this change caused a noticeable compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=31fbcccb3136b9da99e7bc95007e553403fcd641&to=0e7ed32c71362f3547329c6ee8573a8bc191f58a&stat=instructions Highest impact seems to be 7.5% on constants.c from mafft. I don't see anything obvious that can be optimized here though.

In D126885#3755038, @nikic wrote:

FYI this change caused a noticeable compile-time regression: http://llvm-compile-time-tracker.com/compare.php?from=31fbcccb3136b9da99e7bc95007e553403fcd641&to=0e7ed32c71362f3547329c6ee8573a8bc191f58a&stat=instructions Highest impact seems to be 7.5% on constants.c from mafft. I don't see anything obvious that can be optimized here though.

Hope D132750 will fix it in some cases for FP cases.

reames mentioned this in rG42ef5720493e: [SLP] Fix cost model w.r.t. operand properties.Sep 23 2022, 8:40 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

16 lines

TargetTransformInfoImpl.h

7 lines

CodeGen/

BasicTTIImpl.h

7 lines

lib/

Analysis/

TargetTransformInfo.cpp

8 lines

Target/

AArch64/

AArch64TargetTransformInfo.cpp

4 lines

ARM/

ARMTargetTransformInfo.cpp

4 lines

PowerPC/

PPCTargetTransformInfo.cpp

4 lines

RISCV/

RISCVTargetTransformInfo.h

5 lines

RISCVTargetTransformInfo.cpp

26 lines

SystemZ/

SystemZTargetTransformInfo.cpp

5 lines

X86/

X86TargetTransformInfo.h

4 lines

X86TargetTransformInfo.cpp

57 lines

Transforms/

Vectorize/

SLPVectorizer.cpp

56 lines

test/

Transforms/

PhaseOrdering/

X86/

vector-reductions.ll

17 lines

fast-basictest.ll

19 lines

SLPVectorizer/

AArch64/

memory-runtime-checks.ll

37 lines

RISCV/

rvv-min-vector-size.ll

15 lines

X86/

PR31847.ll

62 lines

alternate-fp-inseltpoison.ll

37 lines

37 lines

374 lines

17 lines

25 lines

13 lines

crash_reordering_undefs.ll

25 lines

21 lines

19 lines

28 lines

36 lines

56 lines

65 lines

144 lines

49 lines

37 lines

pr47629-inseltpoison.ll

76 lines

pr47629.ll

76 lines

reduction-logical.ll

228 lines

reduction2.ll

17 lines

vect_copyable_in_binops.ll

42 lines

Diff 436911

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,137 Lines • ▼ Show 20 Lines	getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput,
const Instruction *I = nullptr) const;		const Instruction *I = nullptr) const;

/// \return The expected cost of vector Insert and Extract.		/// \return The expected cost of vector Insert and Extract.
/// Use -1 to indicate that there is no information on the index value.		/// Use -1 to indicate that there is no information on the index value.
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index = -1) const;		unsigned Index = -1) const;

		/// \return The cost of the constant buildvector sequence.
		InstructionCost
		getConstBuildVectorInstrCost(VectorType *VecTy, unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind) const;

/// \return The cost of replication shuffle of \p VF elements typed \p EltTy		/// \return The cost of replication shuffle of \p VF elements typed \p EltTy
/// \p ReplicationFactor times.		/// \p ReplicationFactor times.
///		///
/// For example, the mask for \p ReplicationFactor=3 and \p VF=4 is:		/// For example, the mask for \p ReplicationFactor=3 and \p VF=4 is:
/// <0,0,0,1,1,1,2,2,2,3,3,3>		/// <0,0,0,1,1,1,2,2,2,3,3,3>
InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,		InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,
int VF,		int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
▲ Show 20 Lines • Show All 548 Lines • ▼ Show 20 Lines	virtual InstructionCost getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
Type *CondTy,		Type *CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I) = 0;		const Instruction *I) = 0;
virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		virtual InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) = 0;		unsigned Index) = 0;

virtual InstructionCost		virtual InstructionCost
		getConstBuildVectorInstrCost(VectorType *VecTy, unsigned UserOpcode,
		unsigned Idx, TTI::TargetCostKind CostKind) = 0;

		virtual InstructionCost
getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,		getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) = 0;		TTI::TargetCostKind CostKind) = 0;

virtual InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		virtual InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,
Align Alignment,		Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
▲ Show 20 Lines • Show All 525 Lines • ▼ Show 20 Lines	InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
const Instruction *I) override {		const Instruction *I) override {
return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);		return Impl.getCmpSelInstrCost(Opcode, ValTy, CondTy, VecPred, CostKind, I);
}		}
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) override {		unsigned Index) override {
return Impl.getVectorInstrCost(Opcode, Val, Index);		return Impl.getVectorInstrCost(Opcode, Val, Index);
}		}
InstructionCost		InstructionCost
		getConstBuildVectorInstrCost(VectorType *VecTy, unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind) override {
		return Impl.getConstBuildVectorInstrCost(VecTy, UserOpcode, Idx, CostKind);
		}
		InstructionCost
getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,		getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) override {		TTI::TargetCostKind CostKind) override {
return Impl.getReplicationShuffleCost(EltTy, ReplicationFactor, VF,		return Impl.getReplicationShuffleCost(EltTy, ReplicationFactor, VF,
DemandedDstElts, CostKind);		DemandedDstElts, CostKind);
}		}
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
return 1;		return 1;
}		}

InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) const {		unsigned Index) const {
return 1;		return 1;
}		}

		InstructionCost
		getConstBuildVectorInstrCost(VectorType *VecTy, unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind) const {
		return TTI::TCC_Free;
		}

unsigned getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,		unsigned getReplicationShuffleCost(Type *EltTy, int ReplicationFactor, int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
return 1;		return 1;
}		}

InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src, Align Alignment,
unsigned AddressSpace,		unsigned AddressSpace,
▲ Show 20 Lines • Show All 689 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,141 Lines • ▼ Show 20 Lines	public:
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) {		unsigned Index) {
std::pair<InstructionCost, MVT> LT =		std::pair<InstructionCost, MVT> LT =
getTLI()->getTypeLegalizationCost(DL, Val->getScalarType());		getTLI()->getTypeLegalizationCost(DL, Val->getScalarType());

return LT.first;		return LT.first;
}		}

		InstructionCost getConstBuildVectorInstrCost(VectorType *VecTy,
		unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind) {
		return TTI::TCC_Free;
		}

InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,		InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,
int VF,		int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
assert(DemandedDstElts.getBitWidth() == (unsigned)VF * ReplicationFactor &&		assert(DemandedDstElts.getBitWidth() == (unsigned)VF * ReplicationFactor &&
"Unexpected size of DemandedDstElts.");		"Unexpected size of DemandedDstElts.");

InstructionCost Cost;		InstructionCost Cost;
▲ Show 20 Lines • Show All 1,180 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 846 Lines • ▼ Show 20 Lines
	InstructionCost TargetTransformInfo::getVectorInstrCost(unsigned Opcode,			InstructionCost TargetTransformInfo::getVectorInstrCost(unsigned Opcode,
	Type *Val,			Type *Val,
	unsigned Index) const {			unsigned Index) const {
	InstructionCost Cost = TTIImpl->getVectorInstrCost(Opcode, Val, Index);			InstructionCost Cost = TTIImpl->getVectorInstrCost(Opcode, Val, Index);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

				InstructionCost TargetTransformInfo::getConstBuildVectorInstrCost(
				VectorType *VecTy, unsigned UserOpcode, unsigned Idx,
				TTI::TargetCostKind CostKind) const {
				InstructionCost Cost =
				TTIImpl->getConstBuildVectorInstrCost(VecTy, UserOpcode, Idx, CostKind);
				return Cost;
				}

	InstructionCost TargetTransformInfo::getReplicationShuffleCost(			InstructionCost TargetTransformInfo::getReplicationShuffleCost(
	Type *EltTy, int ReplicationFactor, int VF, const APInt &DemandedDstElts,			Type *EltTy, int ReplicationFactor, int VF, const APInt &DemandedDstElts,
	TTI::TargetCostKind CostKind) {			TTI::TargetCostKind CostKind) {
	InstructionCost Cost = TTIImpl->getReplicationShuffleCost(			InstructionCost Cost = TTIImpl->getReplicationShuffleCost(
	EltTy, ReplicationFactor, VF, DemandedDstElts, CostKind);			EltTy, ReplicationFactor, VF, DemandedDstElts, CostKind);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}
	▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
// There is no cost model for constants with a bit size of 0. Return TCC_Free		// There is no cost model for constants with a bit size of 0. Return TCC_Free
// here, so that constant hoisting will ignore this constant.		// here, so that constant hoisting will ignore this constant.
if (BitSize == 0)		if (BitSize == 0)
return TTI::TCC_Free;		return TTI::TCC_Free;

		// TODO: implement for throughput cost.
		if (CostKind == TTI::TCK_RecipThroughput)
		return TTI::TCC_Free;

unsigned ImmIdx = ~0U;		unsigned ImmIdx = ~0U;
switch (Opcode) {		switch (Opcode) {
default:		default:
return TTI::TCC_Free;		return TTI::TCC_Free;
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
// Always hoist the base address of a GetElementPtr.		// Always hoist the base address of a GetElementPtr.
if (Idx == 0)		if (Idx == 0)
return 2 * TTI::TCC_Basic;		return 2 * TTI::TCC_Basic;
▲ Show 20 Lines • Show All 2,785 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 380 Lines • ▼ Show 20 Lines	if (!FP)
return false;		return false;
return isa<FPToSIInst>(FP);		return isa<FPToSIInst>(FP);
}		}

InstructionCost ARMTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,		InstructionCost ARMTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
Instruction *Inst) {		Instruction *Inst) {
		// TODO: implement for throughput cost.
		if (CostKind == TTI::TCK_RecipThroughput)
		return TTI::TCC_Free;

// Division by a constant can be turned into multiplication, but only if we		// Division by a constant can be turned into multiplication, but only if we
// know it's constant. So it's not so much that the immediate is cheap (it's		// know it's constant. So it's not so much that the immediate is cheap (it's
// not), but that the alternative is worse.		// not), but that the alternative is worse.
// FIXME: this is probably unneeded with GlobalISel.		// FIXME: this is probably unneeded with GlobalISel.
if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv \|\|		if ((Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv \|\|
Opcode == Instruction::SRem \|\| Opcode == Instruction::URem) &&		Opcode == Instruction::SRem \|\| Opcode == Instruction::URem) &&
Idx == 1)		Idx == 1)
return 0;		return 0;
▲ Show 20 Lines • Show All 1,991 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	case Intrinsic::ppc_altivec_vperm:
}		}
break;		break;
}		}
return None;		return None;
}		}

InstructionCost PPCTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,		InstructionCost PPCTTIImpl::getIntImmCost(const APInt &Imm, Type *Ty,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
		// TODO: implement for throughput cost.
		if (CostKind == TTI::TCK_RecipThroughput)
		return TTI::TCC_Free;

if (DisablePPCConstHoist)		if (DisablePPCConstHoist)
return BaseT::getIntImmCost(Imm, Ty, CostKind);		return BaseT::getIntImmCost(Imm, Ty, CostKind);

assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
if (BitSize == 0)		if (BitSize == 0)
return ~0U;		return ~0U;
▲ Show 20 Lines • Show All 1,292 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		InstructionCost getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
bool IsUnsigned,		bool IsUnsigned,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		InstructionCost getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
Optional<FastMathFlags> FMF,		Optional<FastMathFlags> FMF,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);

		InstructionCost getConstBuildVectorInstrCost(VectorType *VecTy,
		unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind);

bool isElementTypeLegalForScalableVector(Type *Ty) const {		bool isElementTypeLegalForScalableVector(Type *Ty) const {
return TLI->isLegalElementTypeForRVV(Ty);		return TLI->isLegalElementTypeForRVV(Ty);
}		}

bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) {		bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) {
if (!ST->hasVInstructions())		if (!ST->hasVInstructions())
return false;		return false;

▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines	RISCVTTIImpl::getArithmeticReductionCost(unsigned Opcode, VectorType *VTy,
// IR Reduction is composed by two vmv and one rvv reduction instruction.		// IR Reduction is composed by two vmv and one rvv reduction instruction.
InstructionCost BaseCost = 2;		InstructionCost BaseCost = 2;
unsigned VL = cast<FixedVectorType>(VTy)->getNumElements();		unsigned VL = cast<FixedVectorType>(VTy)->getNumElements();
if (TTI::requiresOrderedReduction(FMF))		if (TTI::requiresOrderedReduction(FMF))
return (LT.first - 1) + BaseCost + VL;		return (LT.first - 1) + BaseCost + VL;
return (LT.first - 1) + BaseCost + Log2_32_Ceil(VL);		return (LT.first - 1) + BaseCost + Log2_32_Ceil(VL);
}		}

		InstructionCost
		RISCVTTIImpl::getConstBuildVectorInstrCost(VectorType *VecTy,
		unsigned UserOpcode, unsigned Idx,
		TTI::TargetCostKind CostKind) {
		InstructionCost VecCost = 0;
		switch (UserOpcode) {
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr:
		if (Idx == 1)
		return TTI::TCC_Free;
		LLVM_FALLTHROUGH;
		default: {
		APInt PseudoAddr = APInt::getAllOnes(DL.getPointerSizeInBits());
		// Add a cost of address load + the cost of the vector load.
		VecCost =
		RISCVMatInt::getIntMatCost(PseudoAddr, DL.getPointerSizeInBits(),
		getST()->getFeatureBits()) +
		getMemoryOpCost(Instruction::Load, VecTy, DL.getABITypeAlign(VecTy),
		/AddressSpace=/0, CostKind);
		break;
		}
		}
		return VecCost;
		}

void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,		void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
TTI::UnrollingPreferences &UP,		TTI::UnrollingPreferences &UP,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// TODO: More tuning on benchmarks and metrics with changes as needed		// TODO: More tuning on benchmarks and metrics with changes as needed
// would apply to all settings below to enable performance.		// would apply to all settings below to enable performance.


if (ST->enableDefaultUnroll())		if (ST->enableDefaultUnroll())
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	InstructionCost SystemZTTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
Instruction *Inst) {		Instruction *Inst) {
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
// There is no cost model for constants with a bit size of 0. Return TCC_Free		// There is no cost model for constants with a bit size of 0. Return TCC_Free
// here, so that constant hoisting will ignore this constant.		// here, so that constant hoisting will ignore this constant.
if (BitSize == 0)		if (BitSize == 0)
return TTI::TCC_Free;		return TTI::TCC_Free;

		// TODO: implement for throughput cost.
		if (CostKind == TTI::TCK_RecipThroughput)
		return TTI::TCC_Free;

// No cost model for operations on integers larger than 64 bit implemented yet.		// No cost model for operations on integers larger than 64 bit implemented yet.
if (BitSize > 64)		if (BitSize > 64)
return TTI::TCC_Free;		return TTI::TCC_Free;

switch (Opcode) {		switch (Opcode) {
default:		default:
return TTI::TCC_Free;		return TTI::TCC_Free;
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
▲ Show 20 Lines • Show All 1,124 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	InstructionCost getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,		InstructionCost getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy,
CmpInst::Predicate VecPred,		CmpInst::Predicate VecPred,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);
InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,		InstructionCost getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index);		unsigned Index);
		InstructionCost getConstBuildVectorInstrCost(VectorType *VecTy,
		unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind);
InstructionCost getScalarizationOverhead(VectorType *Ty,		InstructionCost getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert, bool Extract);		bool Insert, bool Extract);
InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,		InstructionCost getReplicationShuffleCost(Type *EltTy, int ReplicationFactor,
int VF,		int VF,
const APInt &DemandedDstElts,		const APInt &DemandedDstElts,
TTI::TargetCostKind CostKind);		TTI::TargetCostKind CostKind);
InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,		InstructionCost getMemoryOpCost(unsigned Opcode, Type *Src,
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,769 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
// Add to the base cost if we know that the extracted element of a vector is		// Add to the base cost if we know that the extracted element of a vector is
// destined to be moved to and used in the integer register file.		// destined to be moved to and used in the integer register file.
if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())		if (Opcode == Instruction::ExtractElement && ScalarType->isPointerTy())
RegisterFileMoveCost += 1;		RegisterFileMoveCost += 1;

return BaseT::getVectorInstrCost(Opcode, Val, Index) + RegisterFileMoveCost;		return BaseT::getVectorInstrCost(Opcode, Val, Index) + RegisterFileMoveCost;
}		}

		InstructionCost
		X86TTIImpl::getConstBuildVectorInstrCost(VectorType *VecTy, unsigned UserOpcode,
		unsigned Idx,
		TTI::TargetCostKind CostKind) {
		assert(CostKind == TTI::TCK_RecipThroughput &&
		"Expected only TTI::TCK_RecipThroughput currently.");
		Type *ScalarTy = VecTy->getElementType();
		TypeSize Sz = DL.getTypeSizeInBits(ScalarTy);
		if (CostKind == TTI::TCK_RecipThroughput &&
		(Sz > 64 \|\| (Sz > 32 && ST->is32Bit()) \|\| (Sz > 16 && ST->is16Bit())))
		vdmitrieUnsubmitted Not Done Reply Inline Actions Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3 vdmitrie: Note that it is free for splat shift value only. https://godbolt.org/z/KMf6sW5n3
		RKSimonUnsubmitted Not Done Reply Inline Actions Yes, vector shifts must be splats without AVX2 or XOP RKSimon: Yes, vector shifts must be splats without AVX2 or XOP
		return X86TTIImpl::getMemoryOpCost(Instruction::Load, VecTy,
		DL.getABITypeAlign(VecTy),
		/AddressSpace=/0, CostKind);

		// Even if the const value is read from the memory, for many instructions it
		// is free, since they have a data-from-memory form of the vector
		// instructions.
		switch (UserOpcode) {
		case Instruction::Mul:
		case Instruction::SDiv:
		case Instruction::UDiv:
		case Instruction::Shl:
		case Instruction::LShr:
		RKSimonUnsubmitted Not Done Reply Inline Actions XOP can memory fold from Idx == 0 as well. RKSimon: XOP can memory fold from Idx == 0 as well.
		case Instruction::AShr:
		case Instruction::FAdd:
		case Instruction::FSub:
		case Instruction::FMul:
		case Instruction::FDiv:
		case Instruction::FCmp:
		if (Idx == 1)
		RKSimonUnsubmitted Not Done Reply Inline Actions Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops. RKSimon: Instruction::Add/Sub? Also, we'd need to allow Idx ==0 \|\| Idx == 1 for commutable ops.
		ABataevAuthorUnsubmitted Done Reply Inline Actions I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector Add/Subs (0.2-0.33 vs ~0.5) We can add it later, currently no such kind of analysis in getIntImmCostInst ABataev: 1. I excluded Add/Sub here because scalar Add/Sub with Imm has less cost than the vector…
		return TTI::TCC_Free;
		break;
		default:
		break;
		}
		return getMemoryOpCost(Instruction::Load, VecTy, DL.getABITypeAlign(VecTy),
		/AddressSpace=/0, CostKind);
		}

InstructionCost X86TTIImpl::getScalarizationOverhead(VectorType *Ty,		InstructionCost X86TTIImpl::getScalarizationOverhead(VectorType *Ty,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert,		bool Insert,
bool Extract) {		bool Extract) {
assert(DemandedElts.getBitWidth() ==		assert(DemandedElts.getBitWidth() ==
cast<FixedVectorType>(Ty)->getNumElements() &&		cast<FixedVectorType>(Ty)->getNumElements() &&
"Vector size mismatch");		"Vector size mismatch");

▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src,

InstructionCost Cost = 0;		InstructionCost Cost = 0;

// Source of truth: how many elements were there in the original IR vector?		// Source of truth: how many elements were there in the original IR vector?
const unsigned SrcNumElt = VTy->getNumElements();		const unsigned SrcNumElt = VTy->getNumElements();

// How far have we gotten?		// How far have we gotten?
int NumEltRemaining = SrcNumElt;		int NumEltRemaining = SrcNumElt;
// Note that we intentionally capture by-reference, NumEltRemaining changes.		// Note that we intentionally capture by-reference, NumEltRemaining changes.
		RKSimonUnsubmitted Not Done Reply Inline Actions Just do this once before the !VTy \|\| !LT.second.isVector()) check? RKSimon: Just do this once before the !VTy \|\| !LT.second.isVector()) check?
auto NumEltDone = [&]() { return SrcNumElt - NumEltRemaining; };		auto NumEltDone = [&]() { return SrcNumElt - NumEltRemaining; };

const int MaxLegalOpSizeBytes = divideCeil(LT.second.getSizeInBits(), 8);		const int MaxLegalOpSizeBytes = divideCeil(LT.second.getSizeInBits(), 8);

// Note that even if we can store 64 bits of an XMM, we still operate on XMM.		// Note that even if we can store 64 bits of an XMM, we still operate on XMM.
const unsigned XMMBits = 128;		const unsigned XMMBits = 128;
if (XMMBits % EltTyBits != 0)		if (XMMBits % EltTyBits != 0)
// Vector size must be a multiple of the element size. I.e. no padding.		// Vector size must be a multiple of the element size. I.e. no padding.
▲ Show 20 Lines • Show All 786 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
// There is no cost model for constants with a bit size of 0. Return TCC_Free		// There is no cost model for constants with a bit size of 0. Return TCC_Free
// here, so that constant hoisting will ignore this constant.		// here, so that constant hoisting will ignore this constant.
if (BitSize == 0)		if (BitSize == 0)
return TTI::TCC_Free;		return TTI::TCC_Free;

		uint64_t Val = Imm.getLimitedValue();
		if (CostKind == TTI::TCK_RecipThroughput &&
		(Imm.getActiveBits() > 64 \|\| (!isInt<32>(Val) && ST->is32Bit()) \|\|
		(!isInt<16>(Val) && ST->is16Bit())))
		return X86TTIImpl::getMemoryOpCost(Instruction::Load, Ty,
		DL.getABITypeAlign(Ty),
		/AddressSpace=/0, CostKind);

unsigned ImmIdx = ~0U;		unsigned ImmIdx = ~0U;
switch (Opcode) {		switch (Opcode) {
default:		default:
return TTI::TCC_Free;		return TTI::TCC_Free;
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
// Always hoist the base address of a GetElementPtr. This prevents the		// Always hoist the base address of a GetElementPtr. This prevents the
// creation of new constants for every base constant that gets constant		// creation of new constants for every base constant that gets constant
// folded with the offset.		// folded with the offset.
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	InstructionCost X86TTIImpl::getIntImmCostInst(unsigned Opcode, unsigned Idx,
case Instruction::PHI:		case Instruction::PHI:
case Instruction::Call:		case Instruction::Call:
case Instruction::Select:		case Instruction::Select:
case Instruction::Ret:		case Instruction::Ret:
case Instruction::Load:		case Instruction::Load:
break;		break;
}		}

		// If CostKind is throughput and ImmIdx == Idx, the cost is free. Otherwise,
		// load the const from memory.
		if (CostKind == TTI::TCK_RecipThroughput) {
		if (Idx == ImmIdx)
		return TTI::TCC_Free;
		return X86TTIImpl::getMemoryOpCost(Instruction::Load, Ty,
		DL.getABITypeAlign(Ty),
		/AddressSpace=/0, CostKind);
		}

if (Idx == ImmIdx) {		if (Idx == ImmIdx) {
int NumConstants = divideCeil(BitSize, 64);		int NumConstants = divideCeil(BitSize, 64);
InstructionCost Cost = X86TTIImpl::getIntImmCost(Imm, Ty, CostKind);		InstructionCost Cost = X86TTIImpl::getIntImmCost(Imm, Ty, CostKind);
return (Cost <= NumConstants * TTI::TCC_Basic)		return (Cost <= NumConstants * TTI::TCC_Basic)
? static_cast<int>(TTI::TCC_Free)		? static_cast<int>(TTI::TCC_Free)
: Cost;		: Cost;
}		}

▲ Show 20 Lines • Show All 983 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,650 Lines • ▼ Show 20 Lines	if (P0 == AltP0Swapped)
(I != MainOp &&		(I != MainOp &&
!areCompatibleCmpOps(CI0->getOperand(0), CI0->getOperand(1),		!areCompatibleCmpOps(CI0->getOperand(0), CI0->getOperand(1),
CI->getOperand(0), CI->getOperand(1)));		CI->getOperand(0), CI->getOperand(1)));
return AltP0 == CurrentPred \|\| AltP0Swapped == CurrentPred;		return AltP0 == CurrentPred \|\| AltP0Swapped == CurrentPred;
}		}
return I->getOpcode() == AltOp->getOpcode();		return I->getOpcode() == AltOp->getOpcode();
}		}

		static InstructionCost getCostForConstants(TargetTransformInfo &TTI,
		const DataLayout &DL,
		ArrayRef<Value *> VL,
		unsigned UserOpcode,
		unsigned UserIdx) {
		Type *ScalarTy = VL.front()->getType();
		unsigned VF = VL.size();
		auto *VecTy = FixedVectorType::get(ScalarTy, VF);
		InstructionCost VecCost = TTI.getConstBuildVectorInstrCost(
		VecTy, UserOpcode, UserIdx, TTI::TCK_RecipThroughput);
		InstructionCost ScalarCost = 0;
		if (!ScalarTy->isIntegerTy()) {
		ScalarCost +=
		TTI.getMemoryOpCost(Instruction::Load, ScalarTy,
		DL.getABITypeAlign(ScalarTy),
		/AddressSpace=/0, TTI::TCK_RecipThroughput) *
		VF;
		} else {
		// Be conservative if the data type is larger than the largest legal int
		// type.
		for (Value *V : VL) {
		if (isa<UndefValue>(V))
		continue;
		auto *CI = cast<ConstantInt>(V);
		ScalarCost += TTI.getIntImmCostInst(UserOpcode, UserIdx, CI->getValue(),
		ScalarTy, TTI::TCK_RecipThroughput);
		}
		}
		return VecCost - ScalarCost;
		}

InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,		InstructionCost BoUpSLP::getEntryCost(const TreeEntry *E,
ArrayRef<Value *> VectorizedVals) {		ArrayRef<Value *> VectorizedVals) {
ArrayRef<Value*> VL = E->Scalars;		ArrayRef<Value*> VL = E->Scalars;

Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))		else if (CmpInst *CI = dyn_cast<CmpInst>(VL[0]))
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	for (const auto &Data : ExtractVectorsTys) {
}		}
} else {		} else {
Cost += TTIRef.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,		Cost += TTIRef.getShuffleCost(TargetTransformInfo::SK_InsertSubvector,
VecTy, None, 0, EEVTy);		VecTy, None, 0, EEVTy);
}		}
}		}
};		};
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
if (allConstant(VL))		if (allConstant(VL)) {
		// For reduced constants no need to estimate the cost.
		// FIXME: need to emit an accumulated constant val instead of building a
		// graph.
		if (E->UserTreeIndices.empty())
return 0;		return 0;
		if (all_of(VL, [](Value *V) {
		if (isa<UndefValue>(V))
		return true;
		auto *C = cast<Constant>(V);
		return C->isZeroValue();
		}))
		return 0;
		// TODO: improve opcode and idx for alternate opcodes.
		unsigned UserOpcode =
		E->UserTreeIndices.front().UserTE->isAltShuffle()
		vdmitrieUnsubmitted Not Done Reply Inline Actions This estimate should be bit more complicated. Here are the things that can additionally be considered: for scalar floating point ops a constant operand is normally loaded from memory too. if it is an operand of instruction that becomes immediate (like shift value) and is splat - cost is zero. for a scalar integer op a constant operand is typically an immediate, so this estimate works in most cases but there is an exception: 64 bits operations on a 32bits target. That should be taken into account too. vdmitrie: This estimate should be bit more complicated. Here are the things that can additionally be…
		? 0
		: E->UserTreeIndices.front().UserTE->getOpcode();
		unsigned UserIdx = E->UserTreeIndices.front().UserTE->isAltShuffle()
		? 0
		vdmitrieUnsubmitted Not Done Reply Inline Actions Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root only but constants do not seed vtree. if alternate opcodes are for shl/shr but shift value is splat it is still can be immediate for both of them. vdmitrie: Just wondering is that possible for UserTreeIndices to be empty here? AFAIU it can be for root…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If constants are reduced values in reduction ops. That's why there is a TODO above. ABataev: 1. If constants are reduced values in reduction ops. 2. That's why there is a TODO above.
		vdmitrieUnsubmitted Not Done Reply Inline Actions okay. Although I believe it is not SLP vectorizer job to do constant folding. vdmitrie: okay. Although I believe it is not SLP vectorizer job to do constant folding.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just add a new member function? ABataev: Do you suggest to hide it in getConstBuildVectorInstrCost? And return the difference? Or just…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into inside the TTI interface does not look like right thing to do. But outlining this whole new code into a separate member is a good idea. What sounds weird for me is that constants may seed vtree for reduction. Although that is not directly related to this patch but you are placing here work arounds of that. IMO it is unpractical to run constants reduction through SLP vectorizer machinery. Probably, to make the work around of that issue simpler in this patch, add an early return: if (E->UserTreeIndices.empty()) return 0; Otherwise it will be returning memory-op cost for a foldable operation. vdmitrie: Alternate opcodes is SLP vectorizer specific. For that reason trying to sink that logic into…
		ABataevAuthorUnsubmitted Done Reply Inline Actions What sounds weird for me is that constants may seed vtree for reduction. InstCombiner and other passes are not always able to handle them (or require some extra work and compile time). E.g: define i32 @foo(i32 %v, i32 %a) { %s1 = add i32 %v, 1 %s2 = add i32 %a, 2 %s3 = add i32 %s1, %s2 %s11 = add i32 %v, %a %s31 = add i32 %s3, %s11 %s4 = add i32 %v, 3 %s5 = add i32 %a, 4 %s6 = add i32 %s4, %s5 %s7 = add i32 %s31, %s6 ret i32 %s7 } SLP is able to transform it to: define i32 @foo(i32 %v, i32 %a) { %1 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> <i32 4, i32 3, i32 2, i32 1>) %op.rdx = add i32 %a, %a %op.rdx1 = add i32 %a, %v %op.rdx2 = add i32 %v, %v %op.rdx3 = add i32 %op.rdx, %op.rdx1 %op.rdx4 = add i32 %op.rdx3, %op.rdx2 %op.rdx5 = add i32 %1, %op.rdx4 ret i32 %op.rdx5 } which can be optimized %1 = i32 10 But I agree, that it requires improvement. We don't need to estimate the cost and emit reduction here. I have a patch that improves it. Need to work on it for some time, though. ABataev: > What sounds weird for me is that constants may seed vtree for reduction. InstCombiner and…
		: E->UserTreeIndices.front().EdgeIdx;
		return getCostForConstants(TTI, DL, VL, UserOpcode, UserIdx);
		}
if (isa<InsertElementInst>(VL[0]))		if (isa<InsertElementInst>(VL[0]))
return InstructionCost::getInvalid();		return InstructionCost::getInvalid();
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
isGatherShuffledEntry(E, Mask, Entries);		isGatherShuffledEntry(E, Mask, Entries);
if (Shuffle.hasValue()) {		if (Shuffle.hasValue()) {
InstructionCost GatherCost = 0;		InstructionCost GatherCost = 0;
if (ShuffleVectorInst::isIdentityMask(Mask)) {		if (ShuffleVectorInst::isIdentityMask(Mask)) {
// Perfect match in the graph, will reuse the previously vectorized		// Perfect match in the graph, will reuse the previously vectorized
// node. Cost is 0.		// node. Cost is 0.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "SLP: perfect diamond match for gather bundle that starts with "		<< "SLP: perfect diamond match for gather bundle that starts with "
<< *VL.front() << ".\n");		<< *VL.front() << ".\n");
if (NeedToShuffleReuses)		if (NeedToShuffleReuses)
		vdmitrieUnsubmitted Not Done Reply Inline Actions drop it? vdmitrie: drop it?
		ABataevAuthorUnsubmitted Done Reply Inline Actions What do you mean? ABataev: What do you mean?
		vdmitrieUnsubmitted Not Done Reply Inline Actions Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802, but it is not used. LIne 5811 will subtract one defined at 5795. vdmitrie: Drop extra definition of ScalarCost. Otherwise loop at line 5807 is updating variable from 5802…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Ah, yes, sure. ABataev: Ah, yes, sure.
GatherCost =		GatherCost =
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,		TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc,
FinalVecTy, E->ReuseShuffleIndices);		FinalVecTy, E->ReuseShuffleIndices);
} else {		} else {
LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()		LLVM_DEBUG(dbgs() << "SLP: shuffled " << Entries.size()
		vdmitrieUnsubmitted Not Done Reply Inline Actions Isn't this interface already puts assumption that a constant is a legal immediate? I was trying to explore this too and I found that it does not seem to cover correctly 32bit target specifically for 64bit operations. Ideally we should have interface that tells whether immediate is a legal imm operand for a target but I have not found anything like that. One way to figure this (which I found -may be wrongful) is when condition DL->getTypeStoreSizeInBits(ScalarTy) > DL->getLargestLegalIntTypeSizeInBits() is true we cannot assume operand as a legal immediate. vdmitrie: Isn't this interface already puts assumption that a constant is a legal immediate? I was trying…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll check it. ABataev: I'll check it.
<< " entries for bundle that starts with "		<< " entries for bundle that starts with "
<< *VL.front() << ".\n");		<< *VL.front() << ".\n");
// Detected that instead of gather we can emit a shuffle of single/two		// Detected that instead of gather we can emit a shuffle of single/two
// previously vectorized nodes. Add the cost of the permutation rather		// previously vectorized nodes. Add the cost of the permutation rather
// than gather.		// than gather.
::addMask(Mask, E->ReuseShuffleIndices);		::addMask(Mask, E->ReuseShuffleIndices);
GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);		GatherCost = TTI->getShuffleCost(*Shuffle, FinalVecTy, Mask);
}		}
▲ Show 20 Lines • Show All 6,472 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i64 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i64 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i64 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i64 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i64 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i64 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i64 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i64 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i64 0
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i64 1			; CHECK-NEXT: [[TMP8:%.*]] = fcmp olt <2 x double> [[TMP7]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i1> [[TMP8]], i64 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i64 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP8]], i64 1
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt double [[TMP9]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[TMP10]], i1 [[TMP9]], i1 false
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP]], i1 [[CMP4]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[TMP10:%.*]] = fcmp ule <2 x double> [[TMP7]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = fcmp ule <2 x double> [[TMP7]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP10]], i64 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP11]], i64 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP10]], i64 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP11]], i64 1
	; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[TMP12]], i1 true, i1 [[TMP11]]			; CHECK-NEXT: [[OR_COND1:%.*]] = select i1 [[TMP13]], i1 true, i1 [[TMP12]]
	; CHECK-NEXT: br label [[CLEANUP]]			; CHECK-NEXT: br label [[CLEANUP]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: [[RETVAL_0:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[OR_COND1]], [[LOR_LHS_FALSE]] ]			; CHECK-NEXT: [[RETVAL_0:%.]] = phi i1 [ false, [[ENTRY:%.]] ], [ [[OR_COND1]], [[LOR_LHS_FALSE]] ]
	; CHECK-NEXT: ret i1 [[RETVAL_0]]			; CHECK-NEXT: ret i1 [[RETVAL_0]]
	;			;
	entry:			entry:
	%fneg = fneg double %b			%fneg = fneg double %b
	%add = fadd double %fneg, %c			%add = fadd double %fneg, %c
	Show All 31 Lines

llvm/test/Transforms/PhaseOrdering/fast-basictest.ll

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	;
%B = fmul reassoc nsz float %X1, 47. ; X1*47		%B = fmul reassoc nsz float %X1, 47. ; X1*47
%C = fmul reassoc nsz float %X2, -47. ; X2*-47		%C = fmul reassoc nsz float %X2, -47. ; X2*-47
%D = fadd reassoc nsz float %B, %C ; X147 + X2-47 -> 47*(X1-X2)		%D = fadd reassoc nsz float %B, %C ; X147 + X2-47 -> 47*(X1-X2)
ret float %D		ret float %D
}		}

; TODO: This doesn't require 'nsz'. It should fold to ((x1 - x2) * 47.0)		; TODO: This doesn't require 'nsz'. It should fold to ((x1 - x2) * 47.0)
define float @test13_reassoc(float %X1, float %X2) {		define float @test13_reassoc(float %X1, float %X2) {
; CHECK-LABEL: @test13_reassoc(		; REASSOC_AND_IC-LABEL: @test13_reassoc(
; CHECK-NEXT: [[B:%.]] = fmul reassoc float [[X1:%.]], 4.700000e+01		; REASSOC_AND_IC-NEXT: [[B:%.]] = fmul reassoc float [[X1:%.]], 4.700000e+01
; CHECK-NEXT: [[C:%.]] = fmul reassoc float [[X2:%.]], 4.700000e+01		; REASSOC_AND_IC-NEXT: [[C:%.]] = fmul reassoc float [[X2:%.]], 4.700000e+01
; CHECK-NEXT: [[TMP1:%.*]] = fsub reassoc float [[B]], [[C]]		; REASSOC_AND_IC-NEXT: [[TMP1:%.*]] = fsub reassoc float [[B]], [[C]]
; CHECK-NEXT: ret float [[TMP1]]		; REASSOC_AND_IC-NEXT: ret float [[TMP1]]
		;
		; O2-LABEL: @test13_reassoc(
		; O2-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[X1:%.]], i64 0
		; O2-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[X2:%.]], i64 1
		; O2-NEXT: [[TMP3:%.*]] = fmul reassoc <2 x float> [[TMP2]], <float 4.700000e+01, float 4.700000e+01>
		; O2-NEXT: [[SHIFT:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 undef>
		; O2-NEXT: [[TMP4:%.*]] = fsub reassoc <2 x float> [[TMP3]], [[SHIFT]]
		; O2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i64 0
		; O2-NEXT: ret float [[TMP5]]
;		;
%B = fmul reassoc float %X1, 47. ; X1*47		%B = fmul reassoc float %X1, 47. ; X1*47
%C = fmul reassoc float %X2, -47. ; X2*-47		%C = fmul reassoc float %X2, -47. ; X2*-47
%D = fadd reassoc float %B, %C ; X147 + X2-47 -> 47*(X1-X2)		%D = fadd reassoc float %B, %C ; X147 + X2-47 -> 47*(X1-X2)
ret float %D		ret float %D
}		}

; (b+(a+1234))+-a -> b+1234		; (b+(a+1234))+-a -> b+1234
▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/memory-runtime-checks.ll

	Show First 20 Lines • Show All 660 Lines • ▼ Show 20 Lines
	bb23:			bb23:
	ret void			ret void
	}			}

	; In this test there's a single bound, do not generate runtime checks.			; In this test there's a single bound, do not generate runtime checks.
	define void @single_membound(double* %arg, double* %arg1, double %x) {			define void @single_membound(double* %arg, double* %arg1, double %x) {
	; CHECK-LABEL: @single_membound(			; CHECK-LABEL: @single_membound(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP:%.]] = fsub double [[X:%.]], 9.900000e+01
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[ARG:%.*]], i64 1			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[ARG:%.*]], i64 1
	; CHECK-NEXT: store double [[TMP]], double* [[TMP9]], align 8
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[ARG1:%.*]], i64 0			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[ARG1:%.*]], i64 0
				; CHECK-NEXT: [[TMP:%.]] = fsub double [[X:%.]], 9.900000e+01
				; CHECK-NEXT: store double [[TMP]], double* [[TMP9]], align 8
	; CHECK-NEXT: [[TMP12:%.]] = load double, double [[TMP10]], align 8			; CHECK-NEXT: [[TMP12:%.]] = load double, double [[TMP10]], align 8
	; CHECK-NEXT: [[TMP13:%.*]] = fsub double 1.000000e+00, [[TMP12]]			; CHECK-NEXT: [[TMP13:%.*]] = fsub double 1.000000e+00, [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds double, double [[ARG]], i64 2
	; CHECK-NEXT: br label [[BB15:%.*]]			; CHECK-NEXT: br label [[BB15:%.*]]
	; CHECK: bb15:			; CHECK: bb15:
	; CHECK-NEXT: [[TMP16:%.*]] = fmul double [[TMP]], 2.000000e+01			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[TMP]], i32 0
	; CHECK-NEXT: store double [[TMP16]], double* [[TMP9]], align 8			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[TMP13]], i32 1
	; CHECK-NEXT: [[TMP17:%.*]] = fmul double [[TMP13]], 3.000000e+01			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 2.000000e+01, double 3.000000e+01>
	; CHECK-NEXT: store double [[TMP17]], double* [[TMP14]], align 8			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[TMP9]] to <2 x double>*
				; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tmp = fsub double %x, 99.0			%tmp = fsub double %x, 99.0
	%tmp9 = getelementptr inbounds double, double* %arg, i64 1			%tmp9 = getelementptr inbounds double, double* %arg, i64 1
	store double %tmp, double* %tmp9, align 8			store double %tmp, double* %tmp9, align 8
	%tmp10 = getelementptr inbounds double, double* %arg1, i64 0			%tmp10 = getelementptr inbounds double, double* %arg1, i64 0
	%tmp12 = load double, double* %tmp10, align 8			%tmp12 = load double, double* %tmp10, align 8
	▲ Show 20 Lines • Show All 580 Lines • ▼ Show 20 Lines
	; A test case where there are no instructions accessing a tracked object in a			; A test case where there are no instructions accessing a tracked object in a
	; block for which versioning was requested.			; block for which versioning was requested.
	define void @crash_no_tracked_instructions(float** %arg, float* %arg.2, float* %arg.3, i1 %c) {			define void @crash_no_tracked_instructions(float** %arg, float* %arg.2, float* %arg.3, i1 %c) {
	; CHECK-LABEL: @crash_no_tracked_instructions(			; CHECK-LABEL: @crash_no_tracked_instructions(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[T19:%.]] = load float, float** [[ARG:%.*]], align 8			; CHECK-NEXT: [[T19:%.]] = load float, float** [[ARG:%.*]], align 8
	; CHECK-NEXT: [[T20:%.]] = load float, float [[ARG_3:%.*]], align 4			; CHECK-NEXT: [[T20:%.]] = load float, float [[ARG_3:%.*]], align 4
	; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds float, float [[ARG_2:%.*]], i64 0			; CHECK-NEXT: [[T21:%.]] = getelementptr inbounds float, float [[ARG_2:%.*]], i64 0
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x float> <float 0.000000e+00, float poison>, float [[T20]], i32 1
	; CHECK-NEXT: br i1 [[C:%.]], label [[BB22:%.]], label [[BB30:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[BB22:%.]], label [[BB30:%.*]]
	; CHECK: bb22:			; CHECK: bb22:
	; CHECK-NEXT: [[T23:%.*]] = fmul float [[T20]], 9.900000e+01			; CHECK-NEXT: [[T23:%.*]] = fmul float [[T20]], 9.900000e+01
	; CHECK-NEXT: [[T24:%.*]] = fmul float [[T23]], 9.900000e+01
	; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds float, float [[T19]], i64 2			; CHECK-NEXT: [[T25:%.]] = getelementptr inbounds float, float [[T19]], i64 2
	; CHECK-NEXT: [[T26:%.*]] = fmul float [[T23]], 1.000000e+01			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[T23]], i32 0
	; CHECK-NEXT: store float [[T26]], float* [[T25]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[T23]], i32 1
				; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[TMP2]], <float 9.900000e+01, float 1.000000e+01>
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
				; CHECK-NEXT: store float [[TMP4]], float* [[T25]], align 4
	; CHECK-NEXT: [[T27:%.]] = load float, float [[T21]], align 8			; CHECK-NEXT: [[T27:%.]] = load float, float [[T21]], align 8
	; CHECK-NEXT: [[T28:%.*]] = fadd float [[T24]], 2.000000e+01			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x float> [[TMP3]], <float 2.000000e+01, float 2.000000e+01>
	; CHECK-NEXT: [[T29:%.*]] = fadd float [[T26]], 2.000000e+01
	; CHECK-NEXT: br label [[BB30]]			; CHECK-NEXT: br label [[BB30]]
	; CHECK: bb30:			; CHECK: bb30:
	; CHECK-NEXT: [[T31:%.]] = phi float [ [[T28]], [[BB22]] ], [ 0.000000e+00, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x float> [ [[TMP5]], [[BB22]] ], [ [[TMP0]], [[ENTRY:%.]] ]
	; CHECK-NEXT: [[T32:%.*]] = phi float [ [[T29]], [[BB22]] ], [ [[T20]], [[ENTRY]] ]
	; CHECK-NEXT: br label [[BB36:%.*]]			; CHECK-NEXT: br label [[BB36:%.*]]
	; CHECK: bb36:			; CHECK: bb36:
	; CHECK-NEXT: [[T37:%.*]] = fmul float [[T31]], 3.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x float> [[TMP6]], <float 3.000000e+00, float 3.000000e+00>
	; CHECK-NEXT: [[T38:%.]] = getelementptr inbounds float, float [[ARG_3]], i64 0			; CHECK-NEXT: [[T38:%.]] = getelementptr inbounds float, float [[ARG_3]], i64 0
	; CHECK-NEXT: store float [[T37]], float* [[T38]], align 4			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[T38]] to <2 x float>*
	; CHECK-NEXT: [[T39:%.*]] = fmul float [[T32]], 3.000000e+00			; CHECK-NEXT: store <2 x float> [[TMP7]], <2 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[T40:%.]] = getelementptr inbounds float, float [[ARG_3]], i64 1
	; CHECK-NEXT: store float [[T39]], float* [[T40]], align 4
	; CHECK-NEXT: br label [[BB41:%.*]]			; CHECK-NEXT: br label [[BB41:%.*]]
	; CHECK: bb41:			; CHECK: bb41:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%t19 = load float, float* %arg			%t19 = load float, float* %arg
	%t20 = load float, float* %arg.3, align 4			%t20 = load float, float* %arg.3, align 4
	%t21 = getelementptr inbounds float, float* %arg.2, i64 0			%t21 = getelementptr inbounds float, float* %arg.2, i64 0
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/RISCV/rvv-min-vector-size.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=128 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-128		; RUN: -riscv-v-vector-bits-min=128 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-128
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=256 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-256		; RUN: -riscv-v-vector-bits-min=256 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-256
; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \		; RUN: opt < %s -slp-vectorizer -mtriple=riscv64 -mattr=+v \
; RUN: -riscv-v-vector-bits-min=512 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-512		; RUN: -riscv-v-vector-bits-min=512 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-512

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"		target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n64-S128"
target triple = "riscv64"		target triple = "riscv64"

define void @foo(i64* nocapture writeonly %da) {		define void @foo(i64* nocapture writeonly %da) {
; CHECK-128-LABEL: @foo(		; CHECK-128-LABEL: @foo(
; CHECK-128-NEXT: entry:		; CHECK-128-NEXT: entry:
; CHECK-128-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <2 x i64>		; CHECK-128-NEXT: store i64 0, i64* [[DA:%.*]], align 8
; CHECK-128-NEXT: store <2 x i64> <i64 0, i64 1>, <2 x i64>* [[TMP0]], align 8		; CHECK-128-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 1
		; CHECK-128-NEXT: store i64 1, i64* [[ARRAYIDX1]], align 8
; CHECK-128-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 2		; CHECK-128-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 2
; CHECK-128-NEXT: [[TMP1:%.]] = bitcast i64 [[ARRAYIDX2]] to <2 x i64>*		; CHECK-128-NEXT: store i64 2, i64* [[ARRAYIDX2]], align 8
; CHECK-128-NEXT: store <2 x i64> <i64 2, i64 3>, <2 x i64>* [[TMP1]], align 8		; CHECK-128-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i64, i64 [[DA]], i64 3
		; CHECK-128-NEXT: store i64 3, i64* [[ARRAYIDX3]], align 8
; CHECK-128-NEXT: ret void		; CHECK-128-NEXT: ret void
;		;
; CHECK-256-LABEL: @foo(		; CHECK-256-LABEL: @foo(
; CHECK-256-NEXT: entry:		; CHECK-256-NEXT: entry:
; CHECK-256-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <4 x i64>		; CHECK-256-NEXT: [[TMP0:%.]] = bitcast i64 [[DA:%.]] to <4 x i64>
; CHECK-256-NEXT: store <4 x i64> <i64 0, i64 1, i64 2, i64 3>, <4 x i64>* [[TMP0]], align 8		; CHECK-256-NEXT: store <4 x i64> <i64 0, i64 1, i64 2, i64 3>, <4 x i64>* [[TMP0]], align 8
; CHECK-256-NEXT: ret void		; CHECK-256-NEXT: ret void
;		;
Show All 12 Lines	entry:
%arrayidx3 = getelementptr inbounds i64, i64* %da, i64 3		%arrayidx3 = getelementptr inbounds i64, i64* %da, i64 3
store i64 3, i64* %arrayidx3, align 8		store i64 3, i64* %arrayidx3, align 8
ret void		ret void
}		}

define void @foo8(i8* nocapture writeonly %da) {		define void @foo8(i8* nocapture writeonly %da) {
; CHECK-LABEL: @foo8(		; CHECK-LABEL: @foo8(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[DA:%.]] to <2 x i8>		; CHECK-NEXT: store i8 0, i8* [[DA:%.*]], align 8
; CHECK-NEXT: store <2 x i8> <i8 0, i8 1>, <2 x i8>* [[TMP0]], align 8		; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 1
		; CHECK-NEXT: store i8 1, i8* [[ARRAYIDX1]], align 8
; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 2		; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[DA]], i8 2
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
store i8 0, i8* %da, align 8		store i8 0, i8* %da, align 8
%arrayidx1 = getelementptr inbounds i8, i8* %da, i8 1		%arrayidx1 = getelementptr inbounds i8, i8* %da, i8 1
store i8 1, i8* %arrayidx1, align 8		store i8 1, i8* %arrayidx1, align 8
%arrayidx2 = getelementptr inbounds i8, i8* %da, i8 2		%arrayidx2 = getelementptr inbounds i8, i8* %da, i8 2
ret void		ret void
}		}

llvm/test/Transforms/SLPVectorizer/X86/PR31847.ll

	Show All 17 Lines
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[TMP3]], i32 [[SHR1]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i8, i8 [[TMP3]], i32 [[SHR1]]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[D1_DATA_046:%.]] = phi i8 [ [[TMP3]], [[ENTRY:%.]] ], [ [[ADD_PTR23_1:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[D1_DATA_046:%.]] = phi i8 [ [[TMP3]], [[ENTRY:%.]] ], [ [[ADD_PTR23_1:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[Y_045:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC_1:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[Y_045:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC_1:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP4:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[CONV:%.*]] = zext i8 [[TMP4]] to i32
	; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[CONV]], -128
	; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[ARRAYIDX2]], align 1			; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[ARRAYIDX2]], align 1
	; CHECK-NEXT: [[CONV3:%.*]] = zext i8 [[TMP5]] to i32			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i8> poison, i8 [[TMP5]], i32 0
	; CHECK-NEXT: [[SUB4:%.*]] = add nsw i32 [[CONV3]], -128			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i8> [[TMP6]], i8 [[TMP4]], i32 1
	; CHECK-NEXT: [[CMP5:%.*]] = icmp sgt i32 [[SUB]], -1			; CHECK-NEXT: [[TMP8:%.*]] = zext <2 x i8> [[TMP7]] to <2 x i32>
	; CHECK-NEXT: [[SUB7:%.*]] = sub nsw i32 128, [[CONV]]			; CHECK-NEXT: [[TMP9:%.*]] = add nsw <2 x i32> [[TMP8]], <i32 -128, i32 -128>
	; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP5]], i32 [[SUB]], i32 [[SUB7]]			; CHECK-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[TMP9]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[CMP8:%.*]] = icmp sgt i32 [[SUB4]], -1			; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <2 x i32> <i32 128, i32 128>, [[TMP8]]
	; CHECK-NEXT: [[SUB12:%.*]] = sub nsw i32 128, [[CONV3]]			; CHECK-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP10]], <2 x i32> [[TMP9]], <2 x i32> [[TMP11]]
	; CHECK-NEXT: [[COND14:%.*]] = select i1 [[CMP8]], i32 [[SUB4]], i32 [[SUB12]]			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i32> [[TMP12]], i32 0
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[COND14]], [[COND]]			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP12]], i32 1
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP13]], [[TMP14]]
	; CHECK-NEXT: [[IDX_NEG:%.*]] = sub nsw i32 0, [[ADD]]			; CHECK-NEXT: [[IDX_NEG:%.*]] = sub nsw i32 0, [[ADD]]
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[IDX_NEG]]			; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[IDX_NEG]]
	; CHECK-NEXT: [[TMP6:%.]] = load i8, i8 [[ADD_PTR]], align 1			; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[ADD_PTR]], align 1
	; CHECK-NEXT: [[CONV15:%.*]] = zext i8 [[TMP6]] to i32			; CHECK-NEXT: [[CONV15:%.*]] = zext i8 [[TMP15]] to i32
	; CHECK-NEXT: [[ADD16:%.]] = add nsw i32 [[CONV15]], [[INTENSITY:%.]]			; CHECK-NEXT: [[ADD16:%.]] = add nsw i32 [[CONV15]], [[INTENSITY:%.]]
	; CHECK-NEXT: [[CONV17:%.*]] = trunc i32 [[ADD16]] to i8			; CHECK-NEXT: [[CONV17:%.*]] = trunc i32 [[ADD16]] to i8
	; CHECK-NEXT: store i8 [[CONV17]], i8* [[ADD_PTR]], align 1			; CHECK-NEXT: store i8 [[CONV17]], i8* [[ADD_PTR]], align 1
	; CHECK-NEXT: [[ADD_PTR18:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[ADD]]			; CHECK-NEXT: [[ADD_PTR18:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[ADD]]
	; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[ADD_PTR18]], align 1			; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 [[ADD_PTR18]], align 1
	; CHECK-NEXT: [[NOT_TOBOOL:%.*]] = icmp eq i8 [[TMP7]], 0			; CHECK-NEXT: [[NOT_TOBOOL:%.*]] = icmp eq i8 [[TMP16]], 0
	; CHECK-NEXT: [[CONV21:%.*]] = zext i1 [[NOT_TOBOOL]] to i8			; CHECK-NEXT: [[CONV21:%.*]] = zext i1 [[NOT_TOBOOL]] to i8
	; CHECK-NEXT: store i8 [[CONV21]], i8* [[ADD_PTR18]], align 1			; CHECK-NEXT: store i8 [[CONV21]], i8* [[ADD_PTR18]], align 1
	; CHECK-NEXT: [[ADD_PTR23:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[TMP1]]			; CHECK-NEXT: [[ADD_PTR23:%.]] = getelementptr inbounds i8, i8 [[D1_DATA_046]], i32 [[TMP1]]
	; CHECK-NEXT: [[TMP8:%.]] = load i8, i8 [[ARRAYIDX]], align 1			; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 [[ARRAYIDX]], align 1
	; CHECK-NEXT: [[CONV_1:%.*]] = zext i8 [[TMP8]] to i32			; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 [[ARRAYIDX2]], align 1
	; CHECK-NEXT: [[SUB_1:%.*]] = add nsw i32 [[CONV_1]], -128			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i8> poison, i8 [[TMP18]], i32 0
	; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[ARRAYIDX2]], align 1			; CHECK-NEXT: [[TMP20:%.*]] = insertelement <2 x i8> [[TMP19]], i8 [[TMP17]], i32 1
	; CHECK-NEXT: [[CONV3_1:%.*]] = zext i8 [[TMP9]] to i32			; CHECK-NEXT: [[TMP21:%.*]] = zext <2 x i8> [[TMP20]] to <2 x i32>
	; CHECK-NEXT: [[SUB4_1:%.*]] = add nsw i32 [[CONV3_1]], -128			; CHECK-NEXT: [[TMP22:%.*]] = add nsw <2 x i32> [[TMP21]], <i32 -128, i32 -128>
	; CHECK-NEXT: [[CMP5_1:%.*]] = icmp sgt i32 [[SUB_1]], -1			; CHECK-NEXT: [[TMP23:%.*]] = icmp sgt <2 x i32> [[TMP22]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[SUB7_1:%.*]] = sub nsw i32 128, [[CONV_1]]			; CHECK-NEXT: [[TMP24:%.*]] = sub nsw <2 x i32> <i32 128, i32 128>, [[TMP21]]
	; CHECK-NEXT: [[COND_1:%.*]] = select i1 [[CMP5_1]], i32 [[SUB_1]], i32 [[SUB7_1]]			; CHECK-NEXT: [[TMP25:%.*]] = select <2 x i1> [[TMP23]], <2 x i32> [[TMP22]], <2 x i32> [[TMP24]]
	; CHECK-NEXT: [[CMP8_1:%.*]] = icmp sgt i32 [[SUB4_1]], -1			; CHECK-NEXT: [[TMP26:%.*]] = extractelement <2 x i32> [[TMP25]], i32 0
	; CHECK-NEXT: [[SUB12_1:%.*]] = sub nsw i32 128, [[CONV3_1]]			; CHECK-NEXT: [[TMP27:%.*]] = extractelement <2 x i32> [[TMP25]], i32 1
	; CHECK-NEXT: [[COND14_1:%.*]] = select i1 [[CMP8_1]], i32 [[SUB4_1]], i32 [[SUB12_1]]			; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[TMP26]], [[TMP27]]
	; CHECK-NEXT: [[ADD_1:%.*]] = add nsw i32 [[COND14_1]], [[COND_1]]
	; CHECK-NEXT: [[IDX_NEG_1:%.*]] = sub nsw i32 0, [[ADD_1]]			; CHECK-NEXT: [[IDX_NEG_1:%.*]] = sub nsw i32 0, [[ADD_1]]
	; CHECK-NEXT: [[ADD_PTR_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR23]], i32 [[IDX_NEG_1]]			; CHECK-NEXT: [[ADD_PTR_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR23]], i32 [[IDX_NEG_1]]
	; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 [[ADD_PTR_1]], align 1			; CHECK-NEXT: [[TMP28:%.]] = load i8, i8 [[ADD_PTR_1]], align 1
	; CHECK-NEXT: [[CONV15_1:%.*]] = zext i8 [[TMP10]] to i32			; CHECK-NEXT: [[CONV15_1:%.*]] = zext i8 [[TMP28]] to i32
	; CHECK-NEXT: [[ADD16_1:%.*]] = add nsw i32 [[CONV15_1]], [[INTENSITY]]			; CHECK-NEXT: [[ADD16_1:%.*]] = add nsw i32 [[CONV15_1]], [[INTENSITY]]
	; CHECK-NEXT: [[CONV17_1:%.*]] = trunc i32 [[ADD16_1]] to i8			; CHECK-NEXT: [[CONV17_1:%.*]] = trunc i32 [[ADD16_1]] to i8
	; CHECK-NEXT: store i8 [[CONV17_1]], i8* [[ADD_PTR_1]], align 1			; CHECK-NEXT: store i8 [[CONV17_1]], i8* [[ADD_PTR_1]], align 1
	; CHECK-NEXT: [[ADD_PTR18_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR23]], i32 [[ADD_1]]			; CHECK-NEXT: [[ADD_PTR18_1:%.]] = getelementptr inbounds i8, i8 [[ADD_PTR23]], i32 [[ADD_1]]
	; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[ADD_PTR18_1]], align 1			; CHECK-NEXT: [[TMP29:%.]] = load i8, i8 [[ADD_PTR18_1]], align 1
	; CHECK-NEXT: [[NOT_TOBOOL_1:%.*]] = icmp eq i8 [[TMP11]], 0			; CHECK-NEXT: [[NOT_TOBOOL_1:%.*]] = icmp eq i8 [[TMP29]], 0
	; CHECK-NEXT: [[CONV21_1:%.*]] = zext i1 [[NOT_TOBOOL_1]] to i8			; CHECK-NEXT: [[CONV21_1:%.*]] = zext i1 [[NOT_TOBOOL_1]] to i8
	; CHECK-NEXT: store i8 [[CONV21_1]], i8* [[ADD_PTR18_1]], align 1			; CHECK-NEXT: store i8 [[CONV21_1]], i8* [[ADD_PTR18_1]], align 1
	; CHECK-NEXT: [[ADD_PTR23_1]] = getelementptr inbounds i8, i8* [[ADD_PTR23]], i32 [[TMP1]]			; CHECK-NEXT: [[ADD_PTR23_1]] = getelementptr inbounds i8, i8* [[ADD_PTR23]], i32 [[TMP1]]
	; CHECK-NEXT: [[INC_1]] = add nsw i32 [[Y_045]], 2			; CHECK-NEXT: [[INC_1]] = add nsw i32 [[Y_045]], 2
	; CHECK-NEXT: [[EXITCOND_1:%.*]] = icmp eq i32 [[INC_1]], 128			; CHECK-NEXT: [[EXITCOND_1:%.*]] = icmp eq i32 [[INC_1]], 128
	; CHECK-NEXT: br i1 [[EXITCOND_1]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND_1]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s

define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @fadd_fsub_v8f32(		; CHECK-LABEL: @fadd_fsub_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	;
%r4 = insertelement <8 x float> %r3, float %ab4, i32 4		%r4 = insertelement <8 x float> %r3, float %ab4, i32 4
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {		define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
; SSE-LABEL: @fmul_fdiv_v4f32_const(		; CHECK-LABEL: @fmul_fdiv_v4f32_const(
; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>		; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; SSE-NEXT: ret <4 x float> [[TMP1]]		; CHECK-NEXT: ret <4 x float> [[TMP1]]
;
; SLM-LABEL: @fmul_fdiv_v4f32_const(
; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i64 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3
; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i64 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i64 3
; SLM-NEXT: ret <4 x float> [[R3]]
;
; AVX-LABEL: @fmul_fdiv_v4f32_const(
; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; AVX-NEXT: ret <4 x float> [[TMP1]]
;
; AVX512-LABEL: @fmul_fdiv_v4f32_const(
; AVX512-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; AVX512-NEXT: ret <4 x float> [[TMP1]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%ab0 = fmul float %a0, 2.0		%ab0 = fmul float %a0, 2.0
%ab1 = fmul float %a1, 1.0		%ab1 = fmul float %a1, 1.0
%ab2 = fdiv float %a2, 1.0		%ab2 = fdiv float %a2, 1.0
%ab3 = fdiv float %a3, 0.5		%ab3 = fdiv float %a3, 0.5
%r0 = insertelement <4 x float> poison, float %ab0, i32 0		%r0 = insertelement <4 x float> poison, float %ab0, i32 0
%r1 = insertelement <4 x float> %r0, float %ab1, i32 1		%r1 = insertelement <4 x float> %r0, float %ab1, i32 1
%r2 = insertelement <4 x float> %r1, float %ab2, i32 2		%r2 = insertelement <4 x float> %r1, float %ab2, i32 2
%r3 = insertelement <4 x float> %r2, float %ab3, i32 3		%r3 = insertelement <4 x float> %r2, float %ab3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE		; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=SLM		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=core-avx2 -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=knl -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s
; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512		; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=skx -basic-aa -slp-vectorizer -instcombine -S \| FileCheck %s

define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @fadd_fsub_v8f32(<8 x float> %a, <8 x float> %b) {
; CHECK-LABEL: @fadd_fsub_v8f32(		; CHECK-LABEL: @fadd_fsub_v8f32(
; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = fadd <8 x float> [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]		; CHECK-NEXT: [[TMP2:%.*]] = fsub <8 x float> [[A]], [[B]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> [[TMP2]], <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 13, i32 14, i32 7>
; CHECK-NEXT: ret <8 x float> [[TMP3]]		; CHECK-NEXT: ret <8 x float> [[TMP3]]
;		;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	;
%r4 = insertelement <8 x float> %r3, float %ab4, i32 4		%r4 = insertelement <8 x float> %r3, float %ab4, i32 4
%r5 = insertelement <8 x float> %r4, float %ab5, i32 5		%r5 = insertelement <8 x float> %r4, float %ab5, i32 5
%r6 = insertelement <8 x float> %r5, float %ab6, i32 6		%r6 = insertelement <8 x float> %r5, float %ab6, i32 6
%r7 = insertelement <8 x float> %r6, float %ab7, i32 7		%r7 = insertelement <8 x float> %r6, float %ab7, i32 7
ret <8 x float> %r7		ret <8 x float> %r7
}		}

define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {		define <4 x float> @fmul_fdiv_v4f32_const(<4 x float> %a) {
; SSE-LABEL: @fmul_fdiv_v4f32_const(		; CHECK-LABEL: @fmul_fdiv_v4f32_const(
; SSE-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>		; CHECK-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; SSE-NEXT: ret <4 x float> [[TMP1]]		; CHECK-NEXT: ret <4 x float> [[TMP1]]
;
; SLM-LABEL: @fmul_fdiv_v4f32_const(
; SLM-NEXT: [[A2:%.]] = extractelement <4 x float> [[A:%.]], i64 2
; SLM-NEXT: [[A3:%.*]] = extractelement <4 x float> [[A]], i64 3
; SLM-NEXT: [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
; SLM-NEXT: [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
; SLM-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; SLM-NEXT: [[R2:%.*]] = insertelement <4 x float> [[TMP3]], float [[A2]], i64 2
; SLM-NEXT: [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i64 3
; SLM-NEXT: ret <4 x float> [[R3]]
;
; AVX-LABEL: @fmul_fdiv_v4f32_const(
; AVX-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; AVX-NEXT: ret <4 x float> [[TMP1]]
;
; AVX512-LABEL: @fmul_fdiv_v4f32_const(
; AVX512-NEXT: [[TMP1:%.]] = fmul <4 x float> [[A:%.]], <float 2.000000e+00, float 1.000000e+00, float 1.000000e+00, float 2.000000e+00>
; AVX512-NEXT: ret <4 x float> [[TMP1]]
;		;
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
%a2 = extractelement <4 x float> %a, i32 2		%a2 = extractelement <4 x float> %a, i32 2
%a3 = extractelement <4 x float> %a, i32 3		%a3 = extractelement <4 x float> %a, i32 3
%ab0 = fmul float %a0, 2.0		%ab0 = fmul float %a0, 2.0
%ab1 = fmul float %a1, 1.0		%ab1 = fmul float %a1, 1.0
%ab2 = fdiv float %a2, 1.0		%ab2 = fdiv float %a2, 1.0
%ab3 = fdiv float %a3, 0.5		%ab3 = fdiv float %a3, 0.5
%r0 = insertelement <4 x float> undef, float %ab0, i32 0		%r0 = insertelement <4 x float> undef, float %ab0, i32 0
%r1 = insertelement <4 x float> %r0, float %ab1, i32 1		%r1 = insertelement <4 x float> %r0, float %ab1, i32 1
%r2 = insertelement <4 x float> %r1, float %ab2, i32 2		%r2 = insertelement <4 x float> %r1, float %ab2, i32 2
%r3 = insertelement <4 x float> %r2, float %ab3, i32 3		%r3 = insertelement <4 x float> %r2, float %ab3, i32 3
ret <4 x float> %r3		ret <4 x float> %r3
}		}

llvm/test/Transforms/SLPVectorizer/X86/bool-mask.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE2		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE2
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE,SSE4		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v2 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=SSE4
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v3 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v3 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX
; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v4 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512		; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=x86-64-v4 -basic-aa -slp-vectorizer -S \| FileCheck %s --check-prefixes=AVX512

; // PR42652		; // PR42652
; unsigned long bitmask_16xi8(const char *src) {		; unsigned long bitmask_16xi8(const char *src) {
; unsigned long mask = 0;		; unsigned long mask = 0;
; for (unsigned i = 0; i != 16; ++i) {		; for (unsigned i = 0; i != 16; ++i) {
; if (src[i])		; if (src[i])
; mask \|= (1ull << i);		; mask \|= (1ull << i);
; }		; }
; return mask;		; return mask;
; }		; }

define i64 @bitmask_16xi8(ptr nocapture noundef readonly %src) {		define i64 @bitmask_16xi8(ptr nocapture noundef readonly %src) {
; SSE-LABEL: @bitmask_16xi8(		; SSE2-LABEL: @bitmask_16xi8(
; SSE-NEXT: entry:		; SSE2-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1		; SSE2-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1
; SSE-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i8 [[TMP0]], 0		; SSE2-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i8 [[TMP0]], 0
; SSE-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; SSE2-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1		; SSE2-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1
; SSE-NEXT: [[TMP1:%.*]] = load <8 x i8>, ptr [[ARRAYIDX_1]], align 1		; SSE2-NEXT: [[TMP1:%.*]] = load <8 x i8>, ptr [[ARRAYIDX_1]], align 1
; SSE-NEXT: [[TMP2:%.*]] = icmp eq <8 x i8> [[TMP1]], zeroinitializer		; SSE2-NEXT: [[TMP2:%.*]] = icmp eq <8 x i8> [[TMP1]], zeroinitializer
; SSE-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> zeroinitializer, <8 x i64> <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		; SSE2-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> zeroinitializer, <8 x i64> <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
; SSE-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 9		; SSE2-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 9
; SSE-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ARRAYIDX_9]], align 1		; SSE2-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ARRAYIDX_9]], align 1
; SSE-NEXT: [[TMP5:%.*]] = icmp eq <4 x i8> [[TMP4]], zeroinitializer		; SSE2-NEXT: [[TMP5:%.*]] = icmp eq <4 x i8> [[TMP4]], zeroinitializer
; SSE-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i64> zeroinitializer, <4 x i64> <i64 512, i64 1024, i64 2048, i64 4096>		; SSE2-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i64> zeroinitializer, <4 x i64> <i64 512, i64 1024, i64 2048, i64 4096>
; SSE-NEXT: [[ARRAYIDX_13:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 13		; SSE2-NEXT: [[ARRAYIDX_13:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 13
; SSE-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX_13]], align 1		; SSE2-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX_13]], align 1
; SSE-NEXT: [[TOBOOL_NOT_13:%.*]] = icmp eq i8 [[TMP7]], 0		; SSE2-NEXT: [[TOBOOL_NOT_13:%.*]] = icmp eq i8 [[TMP7]], 0
; SSE-NEXT: [[OR_13:%.*]] = select i1 [[TOBOOL_NOT_13]], i64 0, i64 8192		; SSE2-NEXT: [[OR_13:%.*]] = select i1 [[TOBOOL_NOT_13]], i64 0, i64 8192
; SSE-NEXT: [[ARRAYIDX_14:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 14		; SSE2-NEXT: [[ARRAYIDX_14:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 14
; SSE-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_14]], align 1		; SSE2-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_14]], align 1
; SSE-NEXT: [[TOBOOL_NOT_14:%.*]] = icmp eq i8 [[TMP8]], 0		; SSE2-NEXT: [[TOBOOL_NOT_14:%.*]] = icmp eq i8 [[TMP8]], 0
; SSE-NEXT: [[OR_14:%.*]] = select i1 [[TOBOOL_NOT_14]], i64 0, i64 16384		; SSE2-NEXT: [[OR_14:%.*]] = select i1 [[TOBOOL_NOT_14]], i64 0, i64 16384
; SSE-NEXT: [[ARRAYIDX_15:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 15		; SSE2-NEXT: [[ARRAYIDX_15:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 15
; SSE-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX_15]], align 1		; SSE2-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX_15]], align 1
; SSE-NEXT: [[TOBOOL_NOT_15:%.*]] = icmp eq i8 [[TMP9]], 0		; SSE2-NEXT: [[TOBOOL_NOT_15:%.*]] = icmp eq i8 [[TMP9]], 0
; SSE-NEXT: [[OR_15:%.*]] = select i1 [[TOBOOL_NOT_15]], i64 0, i64 32768		; SSE2-NEXT: [[OR_15:%.*]] = select i1 [[TOBOOL_NOT_15]], i64 0, i64 32768
; SSE-NEXT: [[TMP10:%.*]] = call i64 @llvm.vector.reduce.or.v8i64(<8 x i64> [[TMP3]])		; SSE2-NEXT: [[TMP10:%.*]] = call i64 @llvm.vector.reduce.or.v8i64(<8 x i64> [[TMP3]])
; SSE-NEXT: [[TMP11:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP6]])		; SSE2-NEXT: [[TMP11:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP6]])
; SSE-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP10]], [[TMP11]]		; SSE2-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP10]], [[TMP11]]
; SSE-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_13]], [[OR_14]]		; SSE2-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_13]], [[OR_14]]
; SSE-NEXT: [[OP_RDX2:%.*]] = or i64 [[OR_15]], [[OR]]		; SSE2-NEXT: [[OP_RDX2:%.*]] = or i64 [[OR_15]], [[OR]]
; SSE-NEXT: [[OP_RDX3:%.*]] = or i64 [[OP_RDX1]], [[OP_RDX2]]		; SSE2-NEXT: [[OP_RDX3:%.*]] = or i64 [[OP_RDX1]], [[OP_RDX2]]
; SSE-NEXT: [[OP_RDX4:%.*]] = or i64 [[OP_RDX]], [[OP_RDX3]]		; SSE2-NEXT: [[OP_RDX4:%.*]] = or i64 [[OP_RDX]], [[OP_RDX3]]
; SSE-NEXT: ret i64 [[OP_RDX4]]		; SSE2-NEXT: ret i64 [[OP_RDX4]]
		;
		; SSE4-LABEL: @bitmask_16xi8(
		; SSE4-NEXT: entry:
		; SSE4-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1
		; SSE4-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i8 [[TMP0]], 0
		; SSE4-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
		; SSE4-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1
		; SSE4-NEXT: [[TMP1:%.*]] = load <8 x i8>, ptr [[ARRAYIDX_1]], align 1
		; SSE4-NEXT: [[TMP2:%.*]] = icmp eq <8 x i8> [[TMP1]], zeroinitializer
		; SSE4-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> zeroinitializer, <8 x i64> <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
		; SSE4-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 9
		; SSE4-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ARRAYIDX_9]], align 1
		; SSE4-NEXT: [[TMP5:%.*]] = icmp eq <4 x i8> [[TMP4]], zeroinitializer
		; SSE4-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i64> zeroinitializer, <4 x i64> <i64 512, i64 1024, i64 2048, i64 4096>
		; SSE4-NEXT: [[ARRAYIDX_13:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 13
		; SSE4-NEXT: [[TMP7:%.*]] = load <2 x i8>, ptr [[ARRAYIDX_13]], align 1
		; SSE4-NEXT: [[TMP8:%.*]] = icmp eq <2 x i8> [[TMP7]], zeroinitializer
		; SSE4-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP8]], <2 x i64> zeroinitializer, <2 x i64> <i64 8192, i64 16384>
		; SSE4-NEXT: [[ARRAYIDX_15:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 15
		; SSE4-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_15]], align 1
		; SSE4-NEXT: [[TOBOOL_NOT_15:%.*]] = icmp eq i8 [[TMP10]], 0
		; SSE4-NEXT: [[OR_15:%.*]] = select i1 [[TOBOOL_NOT_15]], i64 0, i64 32768
		; SSE4-NEXT: [[TMP11:%.*]] = call i64 @llvm.vector.reduce.or.v8i64(<8 x i64> [[TMP3]])
		; SSE4-NEXT: [[TMP12:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP6]])
		; SSE4-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP11]], [[TMP12]]
		; SSE4-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0
		; SSE4-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP9]], i32 1
		; SSE4-NEXT: [[OP_RDX1:%.*]] = or i64 [[TMP13]], [[TMP14]]
		; SSE4-NEXT: [[OP_RDX2:%.*]] = or i64 [[OR_15]], [[OR]]
		; SSE4-NEXT: [[OP_RDX3:%.*]] = or i64 [[OP_RDX1]], [[OP_RDX2]]
		; SSE4-NEXT: [[OP_RDX4:%.*]] = or i64 [[OP_RDX]], [[OP_RDX3]]
		; SSE4-NEXT: ret i64 [[OP_RDX4]]
;		;
; AVX-LABEL: @bitmask_16xi8(		; AVX-LABEL: @bitmask_16xi8(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1		; AVX-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1
; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i8 [[TMP0]], 0		; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i8 [[TMP0]], 0
; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1		; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 1
; AVX-NEXT: [[TMP1:%.*]] = load <8 x i8>, ptr [[ARRAYIDX_1]], align 1		; AVX-NEXT: [[TMP1:%.*]] = load <8 x i8>, ptr [[ARRAYIDX_1]], align 1
; AVX-NEXT: [[TMP2:%.*]] = icmp eq <8 x i8> [[TMP1]], zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = icmp eq <8 x i8> [[TMP1]], zeroinitializer
; AVX-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> zeroinitializer, <8 x i64> <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>		; AVX-NEXT: [[TMP3:%.*]] = select <8 x i1> [[TMP2]], <8 x i64> zeroinitializer, <8 x i64> <i64 2, i64 4, i64 8, i64 16, i64 32, i64 64, i64 128, i64 256>
; AVX-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 9		; AVX-NEXT: [[ARRAYIDX_9:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 9
; AVX-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ARRAYIDX_9]], align 1		; AVX-NEXT: [[TMP4:%.*]] = load <4 x i8>, ptr [[ARRAYIDX_9]], align 1
; AVX-NEXT: [[TMP5:%.*]] = icmp eq <4 x i8> [[TMP4]], zeroinitializer		; AVX-NEXT: [[TMP5:%.*]] = icmp eq <4 x i8> [[TMP4]], zeroinitializer
; AVX-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i64> zeroinitializer, <4 x i64> <i64 512, i64 1024, i64 2048, i64 4096>		; AVX-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i64> zeroinitializer, <4 x i64> <i64 512, i64 1024, i64 2048, i64 4096>
; AVX-NEXT: [[ARRAYIDX_13:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 13		; AVX-NEXT: [[ARRAYIDX_13:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 13
; AVX-NEXT: [[TMP7:%.*]] = load i8, ptr [[ARRAYIDX_13]], align 1		; AVX-NEXT: [[TMP7:%.*]] = load <2 x i8>, ptr [[ARRAYIDX_13]], align 1
; AVX-NEXT: [[TOBOOL_NOT_13:%.*]] = icmp eq i8 [[TMP7]], 0		; AVX-NEXT: [[TMP8:%.*]] = icmp eq <2 x i8> [[TMP7]], zeroinitializer
; AVX-NEXT: [[OR_13:%.*]] = select i1 [[TOBOOL_NOT_13]], i64 0, i64 8192		; AVX-NEXT: [[TMP9:%.*]] = select <2 x i1> [[TMP8]], <2 x i64> zeroinitializer, <2 x i64> <i64 8192, i64 16384>
; AVX-NEXT: [[ARRAYIDX_14:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 14
; AVX-NEXT: [[TMP8:%.*]] = load i8, ptr [[ARRAYIDX_14]], align 1
; AVX-NEXT: [[TOBOOL_NOT_14:%.*]] = icmp eq i8 [[TMP8]], 0
; AVX-NEXT: [[OR_14:%.*]] = select i1 [[TOBOOL_NOT_14]], i64 0, i64 16384
; AVX-NEXT: [[ARRAYIDX_15:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 15		; AVX-NEXT: [[ARRAYIDX_15:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i64 15
; AVX-NEXT: [[TMP9:%.*]] = load i8, ptr [[ARRAYIDX_15]], align 1		; AVX-NEXT: [[TMP10:%.*]] = load i8, ptr [[ARRAYIDX_15]], align 1
; AVX-NEXT: [[TOBOOL_NOT_15:%.*]] = icmp eq i8 [[TMP9]], 0		; AVX-NEXT: [[TOBOOL_NOT_15:%.*]] = icmp eq i8 [[TMP10]], 0
; AVX-NEXT: [[OR_15:%.*]] = select i1 [[TOBOOL_NOT_15]], i64 0, i64 32768		; AVX-NEXT: [[OR_15:%.*]] = select i1 [[TOBOOL_NOT_15]], i64 0, i64 32768
; AVX-NEXT: [[TMP10:%.*]] = call i64 @llvm.vector.reduce.or.v8i64(<8 x i64> [[TMP3]])		; AVX-NEXT: [[TMP11:%.*]] = call i64 @llvm.vector.reduce.or.v8i64(<8 x i64> [[TMP3]])
; AVX-NEXT: [[TMP11:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP6]])		; AVX-NEXT: [[TMP12:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP6]])
; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP10]], [[TMP11]]		; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP11]], [[TMP12]]
; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_13]], [[OR_14]]		; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0
		; AVX-NEXT: [[TMP14:%.*]] = extractelement <2 x i64> [[TMP9]], i32 1
		; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[TMP13]], [[TMP14]]
; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OR_15]], [[OR]]		; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OR_15]], [[OR]]
; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[OP_RDX1]], [[OP_RDX2]]		; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[OP_RDX1]], [[OP_RDX2]]
; AVX-NEXT: [[OP_RDX4:%.*]] = or i64 [[OP_RDX]], [[OP_RDX3]]		; AVX-NEXT: [[OP_RDX4:%.*]] = or i64 [[OP_RDX]], [[OP_RDX3]]
; AVX-NEXT: ret i64 [[OP_RDX4]]		; AVX-NEXT: ret i64 [[OP_RDX4]]
;		;
; AVX512-LABEL: @bitmask_16xi8(		; AVX512-LABEL: @bitmask_16xi8(
; AVX512-NEXT: entry:		; AVX512-NEXT: entry:
; AVX512-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1		; AVX512-NEXT: [[TMP0:%.]] = load i8, ptr [[SRC:%.]], align 1
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	entry:
%15 = load i8, ptr %arrayidx.15, align 1		%15 = load i8, ptr %arrayidx.15, align 1
%tobool.not.15 = icmp eq i8 %15, 0		%tobool.not.15 = icmp eq i8 %15, 0
%or.15 = select i1 %tobool.not.15, i64 0, i64 32768		%or.15 = select i1 %tobool.not.15, i64 0, i64 32768
%mask.1.15 = or i64 %or.15, %mask.1.14		%mask.1.15 = or i64 %or.15, %mask.1.14
ret i64 %mask.1.15		ret i64 %mask.1.15
}		}

define i64 @bitmask_4xi16(ptr nocapture noundef readonly %src) {		define i64 @bitmask_4xi16(ptr nocapture noundef readonly %src) {
; SSE-LABEL: @bitmask_4xi16(		; SSE2-LABEL: @bitmask_4xi16(
; SSE-NEXT: entry:		; SSE2-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2		; SSE2-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2
; SSE-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0		; SSE2-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0
; SSE-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; SSE2-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1		; SSE2-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1
; SSE-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[ARRAYIDX_1]], align 2		; SSE2-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[ARRAYIDX_1]], align 2
; SSE-NEXT: [[TMP2:%.*]] = icmp eq <4 x i16> [[TMP1]], zeroinitializer		; SSE2-NEXT: [[TMP2:%.*]] = icmp eq <4 x i16> [[TMP1]], zeroinitializer
; SSE-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>		; SSE2-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
; SSE-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 5		; SSE2-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 5
; SSE-NEXT: [[TMP4:%.*]] = load i16, ptr [[ARRAYIDX_5]], align 2		; SSE2-NEXT: [[TMP4:%.*]] = load i16, ptr [[ARRAYIDX_5]], align 2
; SSE-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i16 [[TMP4]], 0		; SSE2-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i16 [[TMP4]], 0
; SSE-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32		; SSE2-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32
; SSE-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 6		; SSE2-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 6
; SSE-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX_6]], align 2		; SSE2-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX_6]], align 2
; SSE-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i16 [[TMP5]], 0		; SSE2-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i16 [[TMP5]], 0
; SSE-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64		; SSE2-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64
; SSE-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 7		; SSE2-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 7
; SSE-NEXT: [[TMP6:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2		; SSE2-NEXT: [[TMP6:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2
; SSE-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i16 [[TMP6]], 0		; SSE2-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i16 [[TMP6]], 0
; SSE-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128		; SSE2-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
; SSE-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])		; SSE2-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
; SSE-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]		; SSE2-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]
; SSE-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]		; SSE2-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
; SSE-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]		; SSE2-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
; SSE-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]		; SSE2-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]
; SSE-NEXT: ret i64 [[OP_RDX3]]		; SSE2-NEXT: ret i64 [[OP_RDX3]]
		;
		; SSE4-LABEL: @bitmask_4xi16(
		; SSE4-NEXT: entry:
		; SSE4-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2
		; SSE4-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0
		; SSE4-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
		; SSE4-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1
		; SSE4-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[ARRAYIDX_1]], align 2
		; SSE4-NEXT: [[TMP2:%.*]] = icmp eq <4 x i16> [[TMP1]], zeroinitializer
		; SSE4-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
		; SSE4-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 5
		; SSE4-NEXT: [[TMP4:%.*]] = load <2 x i16>, ptr [[ARRAYIDX_5]], align 2
		; SSE4-NEXT: [[TMP5:%.*]] = icmp eq <2 x i16> [[TMP4]], zeroinitializer
		; SSE4-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP5]], <2 x i64> zeroinitializer, <2 x i64> <i64 32, i64 64>
		; SSE4-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 7
		; SSE4-NEXT: [[TMP7:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2
		; SSE4-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i16 [[TMP7]], 0
		; SSE4-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
		; SSE4-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
		; SSE4-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
		; SSE4-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; SSE4-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP9]], [[TMP10]]
		; SSE4-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
		; SSE4-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
		; SSE4-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP8]], [[OP_RDX2]]
		; SSE4-NEXT: ret i64 [[OP_RDX3]]
;		;
; AVX-LABEL: @bitmask_4xi16(		; AVX-LABEL: @bitmask_4xi16(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2		; AVX-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2
; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0		; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0
; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1		; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1
; AVX-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[ARRAYIDX_1]], align 2		; AVX-NEXT: [[TMP1:%.*]] = load <4 x i16>, ptr [[ARRAYIDX_1]], align 2
; AVX-NEXT: [[TMP2:%.*]] = icmp eq <4 x i16> [[TMP1]], zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = icmp eq <4 x i16> [[TMP1]], zeroinitializer
; AVX-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>		; AVX-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 5		; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 5
; AVX-NEXT: [[TMP4:%.*]] = load i16, ptr [[ARRAYIDX_5]], align 2		; AVX-NEXT: [[TMP4:%.*]] = load <2 x i16>, ptr [[ARRAYIDX_5]], align 2
; AVX-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i16 [[TMP4]], 0		; AVX-NEXT: [[TMP5:%.*]] = icmp eq <2 x i16> [[TMP4]], zeroinitializer
; AVX-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32		; AVX-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP5]], <2 x i64> zeroinitializer, <2 x i64> <i64 32, i64 64>
; AVX-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 6
; AVX-NEXT: [[TMP5:%.*]] = load i16, ptr [[ARRAYIDX_6]], align 2
; AVX-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i16 [[TMP5]], 0
; AVX-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64
; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 7		; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 7
; AVX-NEXT: [[TMP6:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2		; AVX-NEXT: [[TMP7:%.*]] = load i16, ptr [[ARRAYIDX_7]], align 2
; AVX-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i16 [[TMP6]], 0		; AVX-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i16 [[TMP7]], 0
; AVX-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128		; AVX-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
; AVX-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]		; AVX-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP9]], [[TMP10]]
; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]		; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]		; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]		; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP8]], [[OP_RDX2]]
; AVX-NEXT: ret i64 [[OP_RDX3]]		; AVX-NEXT: ret i64 [[OP_RDX3]]
;		;
; AVX512-LABEL: @bitmask_4xi16(		; AVX512-LABEL: @bitmask_4xi16(
; AVX512-NEXT: entry:		; AVX512-NEXT: entry:
; AVX512-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2		; AVX512-NEXT: [[TMP0:%.]] = load i16, ptr [[SRC:%.]], align 2
; AVX512-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0		; AVX512-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i16 [[TMP0]], 0
; AVX512-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX512-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX512-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1		; AVX512-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i16, ptr [[SRC]], i64 1
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	entry:
%7 = load i16, ptr %arrayidx.7, align 2		%7 = load i16, ptr %arrayidx.7, align 2
%tobool.not.7 = icmp eq i16 %7, 0		%tobool.not.7 = icmp eq i16 %7, 0
%or.7 = select i1 %tobool.not.7, i64 0, i64 128		%or.7 = select i1 %tobool.not.7, i64 0, i64 128
%mask.1.7 = or i64 %or.7, %mask.1.6		%mask.1.7 = or i64 %or.7, %mask.1.6
ret i64 %mask.1.7		ret i64 %mask.1.7
}		}

define i64 @bitmask_8xi32(ptr nocapture noundef readonly %src) {		define i64 @bitmask_8xi32(ptr nocapture noundef readonly %src) {
; SSE-LABEL: @bitmask_8xi32(		; SSE2-LABEL: @bitmask_8xi32(
; SSE-NEXT: entry:		; SSE2-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4		; SSE2-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4
; SSE-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0		; SSE2-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0
; SSE-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; SSE2-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; SSE-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1		; SSE2-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1
; SSE-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr [[ARRAYIDX_1]], align 4		; SSE2-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr [[ARRAYIDX_1]], align 4
; SSE-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[TMP1]], zeroinitializer		; SSE2-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[TMP1]], zeroinitializer
; SSE-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>		; SSE2-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
; SSE-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 5		; SSE2-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 5
; SSE-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX_5]], align 4		; SSE2-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX_5]], align 4
; SSE-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i32 [[TMP4]], 0		; SSE2-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i32 [[TMP4]], 0
; SSE-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32		; SSE2-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32
; SSE-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 6		; SSE2-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 6
; SSE-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX_6]], align 4		; SSE2-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX_6]], align 4
; SSE-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i32 [[TMP5]], 0		; SSE2-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i32 [[TMP5]], 0
; SSE-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64		; SSE2-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64
; SSE-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 7		; SSE2-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 7
; SSE-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX_7]], align 4		; SSE2-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX_7]], align 4
; SSE-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i32 [[TMP6]], 0		; SSE2-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i32 [[TMP6]], 0
; SSE-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128		; SSE2-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
; SSE-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])		; SSE2-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
; SSE-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]		; SSE2-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]
; SSE-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]		; SSE2-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
; SSE-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]		; SSE2-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
; SSE-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]		; SSE2-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]
; SSE-NEXT: ret i64 [[OP_RDX3]]		; SSE2-NEXT: ret i64 [[OP_RDX3]]
		;
		; SSE4-LABEL: @bitmask_8xi32(
		; SSE4-NEXT: entry:
		; SSE4-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4
		; SSE4-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0
		; SSE4-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
		; SSE4-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1
		; SSE4-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr [[ARRAYIDX_1]], align 4
		; SSE4-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[TMP1]], zeroinitializer
		; SSE4-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
		; SSE4-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 5
		; SSE4-NEXT: [[TMP4:%.*]] = load <2 x i32>, ptr [[ARRAYIDX_5]], align 4
		; SSE4-NEXT: [[TMP5:%.*]] = icmp eq <2 x i32> [[TMP4]], zeroinitializer
		; SSE4-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP5]], <2 x i64> zeroinitializer, <2 x i64> <i64 32, i64 64>
		; SSE4-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 7
		; SSE4-NEXT: [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX_7]], align 4
		; SSE4-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i32 [[TMP7]], 0
		; SSE4-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
		; SSE4-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
		; SSE4-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
		; SSE4-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; SSE4-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP9]], [[TMP10]]
		; SSE4-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
		; SSE4-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
		; SSE4-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP8]], [[OP_RDX2]]
		; SSE4-NEXT: ret i64 [[OP_RDX3]]
;		;
; AVX-LABEL: @bitmask_8xi32(		; AVX-LABEL: @bitmask_8xi32(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4		; AVX-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4
; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0		; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0
; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1		; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1
; AVX-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr [[ARRAYIDX_1]], align 4		; AVX-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr [[ARRAYIDX_1]], align 4
; AVX-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[TMP1]], zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = icmp eq <4 x i32> [[TMP1]], zeroinitializer
; AVX-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>		; AVX-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 5		; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 5
; AVX-NEXT: [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX_5]], align 4		; AVX-NEXT: [[TMP4:%.*]] = load <2 x i32>, ptr [[ARRAYIDX_5]], align 4
; AVX-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i32 [[TMP4]], 0		; AVX-NEXT: [[TMP5:%.*]] = icmp eq <2 x i32> [[TMP4]], zeroinitializer
; AVX-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32		; AVX-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP5]], <2 x i64> zeroinitializer, <2 x i64> <i64 32, i64 64>
; AVX-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 6
; AVX-NEXT: [[TMP5:%.*]] = load i32, ptr [[ARRAYIDX_6]], align 4
; AVX-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i32 [[TMP5]], 0
; AVX-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64
; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 7		; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 7
; AVX-NEXT: [[TMP6:%.*]] = load i32, ptr [[ARRAYIDX_7]], align 4		; AVX-NEXT: [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX_7]], align 4
; AVX-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i32 [[TMP6]], 0		; AVX-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i32 [[TMP7]], 0
; AVX-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128		; AVX-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
; AVX-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]		; AVX-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP9]], [[TMP10]]
; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]		; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]		; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]		; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP8]], [[OP_RDX2]]
; AVX-NEXT: ret i64 [[OP_RDX3]]		; AVX-NEXT: ret i64 [[OP_RDX3]]
;		;
; AVX512-LABEL: @bitmask_8xi32(		; AVX512-LABEL: @bitmask_8xi32(
; AVX512-NEXT: entry:		; AVX512-NEXT: entry:
; AVX512-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4		; AVX512-NEXT: [[TMP0:%.]] = load i32, ptr [[SRC:%.]], align 4
; AVX512-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0		; AVX512-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i32 [[TMP0]], 0
; AVX512-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX512-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX512-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1		; AVX512-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 1
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
; SSE4-NEXT: [[TMP0:%.]] = load i64, ptr [[SRC:%.]], align 8		; SSE4-NEXT: [[TMP0:%.]] = load i64, ptr [[SRC:%.]], align 8
; SSE4-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i64 [[TMP0]], 0		; SSE4-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i64 [[TMP0]], 0
; SSE4-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; SSE4-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; SSE4-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 1		; SSE4-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 1
; SSE4-NEXT: [[TMP1:%.*]] = load <4 x i64>, ptr [[ARRAYIDX_1]], align 8		; SSE4-NEXT: [[TMP1:%.*]] = load <4 x i64>, ptr [[ARRAYIDX_1]], align 8
; SSE4-NEXT: [[TMP2:%.*]] = icmp eq <4 x i64> [[TMP1]], zeroinitializer		; SSE4-NEXT: [[TMP2:%.*]] = icmp eq <4 x i64> [[TMP1]], zeroinitializer
; SSE4-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>		; SSE4-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
; SSE4-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 5		; SSE4-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 5
; SSE4-NEXT: [[TMP4:%.*]] = load i64, ptr [[ARRAYIDX_5]], align 8		; SSE4-NEXT: [[TMP4:%.*]] = load <2 x i64>, ptr [[ARRAYIDX_5]], align 8
; SSE4-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i64 [[TMP4]], 0		; SSE4-NEXT: [[TMP5:%.*]] = icmp eq <2 x i64> [[TMP4]], zeroinitializer
; SSE4-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32		; SSE4-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP5]], <2 x i64> zeroinitializer, <2 x i64> <i64 32, i64 64>
; SSE4-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 6
; SSE4-NEXT: [[TMP5:%.*]] = load i64, ptr [[ARRAYIDX_6]], align 8
; SSE4-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i64 [[TMP5]], 0
; SSE4-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64
; SSE4-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 7		; SSE4-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 7
; SSE4-NEXT: [[TMP6:%.*]] = load i64, ptr [[ARRAYIDX_7]], align 8		; SSE4-NEXT: [[TMP7:%.*]] = load i64, ptr [[ARRAYIDX_7]], align 8
; SSE4-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i64 [[TMP6]], 0		; SSE4-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i64 [[TMP7]], 0
; SSE4-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128		; SSE4-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
; SSE4-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])		; SSE4-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
; SSE4-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]		; SSE4-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
		; SSE4-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; SSE4-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP9]], [[TMP10]]
; SSE4-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]		; SSE4-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
; SSE4-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]		; SSE4-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
; SSE4-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]		; SSE4-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP8]], [[OP_RDX2]]
; SSE4-NEXT: ret i64 [[OP_RDX3]]		; SSE4-NEXT: ret i64 [[OP_RDX3]]
;		;
; AVX-LABEL: @bitmask_8xi64(		; AVX-LABEL: @bitmask_8xi64(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i64, ptr [[SRC:%.]], align 8		; AVX-NEXT: [[TMP0:%.]] = load i64, ptr [[SRC:%.]], align 8
; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i64 [[TMP0]], 0		; AVX-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i64 [[TMP0]], 0
; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 1		; AVX-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 1
; AVX-NEXT: [[TMP1:%.*]] = load <4 x i64>, ptr [[ARRAYIDX_1]], align 8		; AVX-NEXT: [[TMP1:%.*]] = load <4 x i64>, ptr [[ARRAYIDX_1]], align 8
; AVX-NEXT: [[TMP2:%.*]] = icmp eq <4 x i64> [[TMP1]], zeroinitializer		; AVX-NEXT: [[TMP2:%.*]] = icmp eq <4 x i64> [[TMP1]], zeroinitializer
; AVX-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>		; AVX-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i64> zeroinitializer, <4 x i64> <i64 2, i64 4, i64 8, i64 16>
; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 5		; AVX-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 5
; AVX-NEXT: [[TMP4:%.*]] = load i64, ptr [[ARRAYIDX_5]], align 8		; AVX-NEXT: [[TMP4:%.*]] = load <2 x i64>, ptr [[ARRAYIDX_5]], align 8
; AVX-NEXT: [[TOBOOL_NOT_5:%.*]] = icmp eq i64 [[TMP4]], 0		; AVX-NEXT: [[TMP5:%.*]] = icmp eq <2 x i64> [[TMP4]], zeroinitializer
; AVX-NEXT: [[OR_5:%.*]] = select i1 [[TOBOOL_NOT_5]], i64 0, i64 32		; AVX-NEXT: [[TMP6:%.*]] = select <2 x i1> [[TMP5]], <2 x i64> zeroinitializer, <2 x i64> <i64 32, i64 64>
; AVX-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 6
; AVX-NEXT: [[TMP5:%.*]] = load i64, ptr [[ARRAYIDX_6]], align 8
; AVX-NEXT: [[TOBOOL_NOT_6:%.*]] = icmp eq i64 [[TMP5]], 0
; AVX-NEXT: [[OR_6:%.*]] = select i1 [[TOBOOL_NOT_6]], i64 0, i64 64
; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 7		; AVX-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 7
; AVX-NEXT: [[TMP6:%.*]] = load i64, ptr [[ARRAYIDX_7]], align 8		; AVX-NEXT: [[TMP7:%.*]] = load i64, ptr [[ARRAYIDX_7]], align 8
; AVX-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i64 [[TMP6]], 0		; AVX-NEXT: [[TOBOOL_NOT_7:%.*]] = icmp eq i64 [[TMP7]], 0
; AVX-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128		; AVX-NEXT: [[OR_7:%.*]] = select i1 [[TOBOOL_NOT_7]], i64 0, i64 128
; AVX-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])		; AVX-NEXT: [[TMP8:%.*]] = call i64 @llvm.vector.reduce.or.v4i64(<4 x i64> [[TMP3]])
; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[OR_5]], [[OR_6]]		; AVX-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
		; AVX-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
		; AVX-NEXT: [[OP_RDX:%.*]] = or i64 [[TMP9]], [[TMP10]]
; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]		; AVX-NEXT: [[OP_RDX1:%.*]] = or i64 [[OR_7]], [[OR]]
; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]		; AVX-NEXT: [[OP_RDX2:%.*]] = or i64 [[OP_RDX]], [[OP_RDX1]]
; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP7]], [[OP_RDX2]]		; AVX-NEXT: [[OP_RDX3:%.*]] = or i64 [[TMP8]], [[OP_RDX2]]
; AVX-NEXT: ret i64 [[OP_RDX3]]		; AVX-NEXT: ret i64 [[OP_RDX3]]
;		;
; AVX512-LABEL: @bitmask_8xi64(		; AVX512-LABEL: @bitmask_8xi64(
; AVX512-NEXT: entry:		; AVX512-NEXT: entry:
; AVX512-NEXT: [[TMP0:%.]] = load i64, ptr [[SRC:%.]], align 8		; AVX512-NEXT: [[TMP0:%.]] = load i64, ptr [[SRC:%.]], align 8
; AVX512-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i64 [[TMP0]], 0		; AVX512-NEXT: [[TOBOOL_NOT:%.*]] = icmp ne i64 [[TMP0]], 0
; AVX512-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64		; AVX512-NEXT: [[OR:%.*]] = zext i1 [[TOBOOL_NOT]] to i64
; AVX512-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 1		; AVX512-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 1
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1
	; CHECK-NEXT: [[TMP28:%.*]] = insertelement <2 x double> poison, double [[CALL]], i32 0			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <2 x double> poison, double [[CALL]], i32 0
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1			; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]			; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x double> [[TMP31]], double [[MUL88]], i32 1			; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x double> [[TMP31]], double [[MUL88]], i32 1
	; CHECK-NEXT: [[TMP33:%.*]] = fdiv <2 x double> [[TMP30]], [[TMP32]]			; CHECK-NEXT: [[TMP33:%.*]] = fdiv <2 x double> [[TMP30]], [[TMP32]]
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP33]], i32 1			; CHECK-NEXT: [[TMP34:%.*]] = fcmp olt <2 x double> [[TMP33]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP34]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[TMP35:%.*]] = extractelement <2 x i1> [[TMP34]], i32 0
	; CHECK-NEXT: [[TMP35:%.*]] = extractelement <2 x double> [[TMP33]], i32 0			; CHECK-NEXT: [[TMP36:%.*]] = extractelement <2 x i1> [[TMP34]], i32 1
	; CHECK-NEXT: [[CMP94:%.*]] = fcmp olt double [[TMP35]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[TMP36]], i1 [[TMP35]], i1 false
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP93]], i1 [[CMP94]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP]], label [[LOR_LHS_FALSE:%.*]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP]], label [[LOR_LHS_FALSE:%.*]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[TMP36:%.*]] = fcmp ule <2 x double> [[TMP33]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP37:%.*]] = fcmp ule <2 x double> [[TMP33]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP37:%.*]] = extractelement <2 x i1> [[TMP36]], i32 0			; CHECK-NEXT: [[TMP38:%.*]] = extractelement <2 x i1> [[TMP37]], i32 0
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <2 x i1> [[TMP36]], i32 1			; CHECK-NEXT: [[TMP39:%.*]] = extractelement <2 x i1> [[TMP37]], i32 1
	; CHECK-NEXT: [[OR_COND106:%.*]] = select i1 [[TMP38]], i1 true, i1 [[TMP37]]			; CHECK-NEXT: [[OR_COND106:%.*]] = select i1 [[TMP39]], i1 true, i1 [[TMP38]]
	; CHECK-NEXT: [[SPEC_SELECT:%.*]] = zext i1 [[OR_COND106]] to i32			; CHECK-NEXT: [[SPEC_SELECT:%.*]] = zext i1 [[OR_COND106]] to i32
	; CHECK-NEXT: br label [[CLEANUP]]			; CHECK-NEXT: br label [[CLEANUP]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: [[RETVAL_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 0, [[IF_END]] ], [ [[SPEC_SELECT]], [[LOR_LHS_FALSE]] ]			; CHECK-NEXT: [[RETVAL_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 0, [[IF_END]] ], [ [[SPEC_SELECT]], [[LOR_LHS_FALSE]] ]
	; CHECK-NEXT: ret i32 [[RETVAL_0]]			; CHECK-NEXT: ret i32 [[RETVAL_0]]
	;			;
	entry:			entry:
	%dir = getelementptr inbounds %struct.ray, ptr %ray, i64 0, i32 1			%dir = getelementptr inbounds %struct.ray, ptr %ray, i64 0, i32 1
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_binaryop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	@a = common global double 0.000000e+00, align 8			@a = common global double 0.000000e+00, align 8

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[INIT:%.]] = load double, double @a, align 8			; CHECK-NEXT: [[INIT:%.]] = load double, double @a, align 8
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[INIT]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[INIT]], i32 1
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[PHI:%.]] = phi double [ [[ADD2:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[PHI:%.]] = phi double [ [[ADD2:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[POSTADD1_PHI:%.]] = phi double [ [[POSTADD1:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP9:%.]], [[LOOP]] ], [ [[TMP1]], [[ENTRY]] ]
	; CHECK-NEXT: [[POSTADD2_PHI:%.]] = phi double [ [[POSTADD2:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ADD1:%.*]] = fadd double [[POSTADD1_PHI]], undef			; CHECK-NEXT: [[ADD1:%.*]] = fadd double [[TMP3]], undef
	; CHECK-NEXT: [[ADD2]] = fadd double [[POSTADD2_PHI]], [[PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP2]], i32 1
				; CHECK-NEXT: [[ADD2]] = fadd double [[TMP4]], [[PHI]]
	; CHECK-NEXT: [[MUL2:%.*]] = fmul double [[ADD2]], 0.000000e+00			; CHECK-NEXT: [[MUL2:%.*]] = fmul double [[ADD2]], 0.000000e+00
	; CHECK-NEXT: [[BINARYOP_B:%.*]] = fadd double [[POSTADD1_PHI]], [[MUL2]]			; CHECK-NEXT: [[BINARYOP_B:%.*]] = fadd double [[TMP3]], [[MUL2]]
	; CHECK-NEXT: [[MUL1:%.*]] = fmul double [[ADD1]], 0.000000e+00			; CHECK-NEXT: [[MUL1:%.*]] = fmul double [[ADD1]], 0.000000e+00
	; CHECK-NEXT: [[TMP:%.*]] = fadd double [[POSTADD2_PHI]], 0.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL1]], i32 0
	; CHECK-NEXT: [[BINARY_V:%.*]] = fadd double [[MUL1]], [[BINARYOP_B]]			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[POSTADD1]] = fadd double [[BINARY_V]], 0.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> <double poison, double 0.000000e+00>, double [[BINARYOP_B]], i32 0
	; CHECK-NEXT: [[POSTADD2]] = fadd double [[TMP]], 1.000000e+00			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[POSTADD1]], 0.000000e+00			; CHECK-NEXT: [[TMP9]] = fadd <2 x double> [[TMP8]], <double 0.000000e+00, double 1.000000e+00>
				; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0
				; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP10]], 0.000000e+00
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 1			; CHECK-NEXT: ret i32 1
	;			;
	entry:			entry:
	%init = load double, double* @a, align 8			%init = load double, double* @a, align 8
	br label %loop			br label %loop

	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }			%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" = type { i32, i32 }

	define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {			define void @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(%"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* nocapture %info) {
	; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(			; CHECK-LABEL: @_ZN23btGeneric6DofConstraint8getInfo1EPN17btTypedConstraint17btConstraintInfo1E(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_ELSE:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0			; CHECK-NEXT: [[M_NUMCONSTRAINTROWS4:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO:%.*]], i64 0, i32 0
				; CHECK-NEXT: [[NUB5:%.]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960" [[INFO]], i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE_I_1:%.]], label [[IF_THEN7_1:%.]]
	; CHECK: land.lhs.true.i.1:			; CHECK: land.lhs.true.i.1:
	; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]			; CHECK-NEXT: br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]
	; CHECK: if.then7.1:			; CHECK: if.then7.1:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*			; CHECK-NEXT: store i32 1, i32* [[M_NUMCONSTRAINTROWS4]], align 4
	; CHECK-NEXT: store <2 x i32> <i32 1, i32 5>, <2 x i32>* [[TMP0]], align 4			; CHECK-NEXT: store i32 5, i32* [[NUB5]], align 4
	; CHECK-NEXT: br label [[FOR_INC_1]]			; CHECK-NEXT: br label [[FOR_INC_1]]
	; CHECK: for.inc.1:			; CHECK: for.inc.1:
	; CHECK-NEXT: [[TMP1:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]			; CHECK-NEXT: [[TMP0:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 1, i32 -1>			; CHECK-NEXT: [[TMP1:%.*]] = add nsw <2 x i32> [[TMP0]], <i32 1, i32 -1>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP1]], <2 x i32>* [[TMP2]], align 4
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br i1 undef, label %if.else, label %if.then			br i1 undef, label %if.else, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	ret void			ret void

	▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux -mcpu=corei7-avx \| FileCheck %s

	define i32 @crash_reordering_undefs() {			define i32 @crash_reordering_undefs() {
	; CHECK-LABEL: @crash_reordering_undefs(			; CHECK-LABEL: @crash_reordering_undefs(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[OR0:%.*]] = or i64 undef, undef			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> undef, <4 x i64> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>
	; CHECK-NEXT: [[CMP0:%.*]] = icmp eq i64 undef, [[OR0]]			; CHECK-NEXT: [[TMP0:%.*]] = icmp eq <4 x i64> undef, [[SHUFFLE]]
	; CHECK-NEXT: [[ADD0:%.*]] = select i1 [[CMP0]], i32 65536, i32 65537			; CHECK-NEXT: [[TMP1:%.*]] = select <4 x i1> [[TMP0]], <4 x i32> <i32 65536, i32 65536, i32 65536, i32 65536>, <4 x i32> <i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[CMP1:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)
	; CHECK-NEXT: [[ADD2:%.*]] = select i1 [[CMP1]], i32 65536, i32 65537			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP1]])
	; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i64 undef, undef			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[ADD4:%.*]] = select i1 [[CMP2]], i32 65536, i32 65537			; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[OP_RDX]], undef
	; CHECK-NEXT: [[OR1:%.*]] = or i64 undef, undef			; CHECK-NEXT: ret i32 [[OP_RDX1]]
	; CHECK-NEXT: [[CMP3:%.*]] = icmp eq i64 undef, [[OR1]]
	; CHECK-NEXT: [[ADD9:%.*]] = select i1 [[CMP3]], i32 65536, i32 65537
	; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 undef, [[ADD0]]
	; CHECK-NEXT: [[OP_RDX1:%.*]] = add i32 [[ADD2]], [[ADD4]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = add i32 [[OP_RDX]], [[OP_RDX1]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = add i32 [[OP_RDX2]], [[ADD9]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = add i32 [[TMP0]], [[OP_RDX3]]
	; CHECK-NEXT: ret i32 [[OP_RDX4]]
	;			;
	entry:			entry:
	%or0 = or i64 undef, undef			%or0 = or i64 undef, undef
	%cmp0 = icmp eq i64 undef, %or0			%cmp0 = icmp eq i64 undef, %or0
	%add0 = select i1 %cmp0, i32 65536, i32 65537			%add0 = select i1 %cmp0, i32 65536, i32 65537
	%add1 = add i32 undef, %add0			%add1 = add i32 undef, %add0
	%cmp1 = icmp eq i64 undef, undef			%cmp1 = icmp eq i64 undef, undef
	%add2 = select i1 %cmp1, i32 65536, i32 65537			%add2 = select i1 %cmp1, i32 65536, i32 65537
	Show All 14 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll

	Show All 21 Lines
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE:%.]], label [[LAND_LHS_TRUE167:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE:%.]], label [[LAND_LHS_TRUE167:%.]]
	; CHECK: land.lhs.true:			; CHECK: land.lhs.true:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN17:%.*]], label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN17:%.*]], label [[LAND_LHS_TRUE167]]
	; CHECK: if.then17:			; CHECK: if.then17:
	; CHECK-NEXT: br i1 undef, label [[IF_END98:%.]], label [[LAND_RHS_LR_PH:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_END98:%.]], label [[LAND_RHS_LR_PH:%.]]
	; CHECK: land.rhs.lr.ph:			; CHECK: land.rhs.lr.ph:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end98:			; CHECK: if.end98:
				; CHECK-NEXT: [[FROM299:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 1
	; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]			; CHECK-NEXT: br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]
	; CHECK: if.then103:			; CHECK: if.then103:
	; CHECK-NEXT: [[FROM1115:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 0
	; CHECK-NEXT: [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef			; CHECK-NEXT: [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef
	; CHECK-NEXT: [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2			; CHECK-NEXT: [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2
				; CHECK-NEXT: [[FROM1115:%.]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171 undef, i64 0, i32 0
	; CHECK-NEXT: [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]			; CHECK-NEXT: [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> poison, i32 [[COND125]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[DOTSUB100]], i32 1
	; CHECK-NEXT: br label [[FOR_COND_I:%.*]]			; CHECK-NEXT: br label [[FOR_COND_I:%.*]]
	; CHECK: for.cond.i:			; CHECK: for.cond.i:
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x i32> [ undef, [[LAND_RHS_I874:%.]] ], [ [[TMP1]], [[IF_THEN103]] ]			; CHECK-NEXT: [[ROW_0_I:%.]] = phi i32 [ undef, [[LAND_RHS_I874:%.]] ], [ [[DOTSUB100]], [[IF_THEN103]] ]
				; CHECK-NEXT: [[COL_0_I:%.*]] = phi i32 [ undef, [[LAND_RHS_I874]] ], [ [[COND125]], [[IF_THEN103]] ]
	; CHECK-NEXT: br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]
	; CHECK: land.rhs.i874:			; CHECK: land.rhs.i874:
	; CHECK-NEXT: br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]			; CHECK-NEXT: br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]
	; CHECK: for.end.i:			; CHECK: for.end.i:
	; CHECK-NEXT: br i1 undef, label [[IF_THEN_I:%.]], label [[IF_END_I:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN_I:%.]], label [[IF_END_I:%.]]
	; CHECK: if.then.i:			; CHECK: if.then.i:
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[TMP2]], undef			; CHECK-NEXT: [[ADD14_I:%.*]] = add nsw i32 [[ROW_0_I]], undef
				; CHECK-NEXT: [[ADD15_I:%.*]] = add nsw i32 [[COL_0_I]], undef
	; CHECK-NEXT: br label [[EXTEND_BW_EXIT:%.*]]			; CHECK-NEXT: br label [[EXTEND_BW_EXIT:%.*]]
	; CHECK: if.end.i:			; CHECK: if.end.i:
	; CHECK-NEXT: [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]			; CHECK-NEXT: [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]
	; CHECK-NEXT: [[CMP26514_I:%.*]] = icmp slt i32 [[ADD16_I]], 0			; CHECK-NEXT: [[CMP26514_I:%.*]] = icmp slt i32 [[ADD16_I]], 0
	; CHECK-NEXT: br i1 [[CMP26514_I]], label [[FOR_END33_I:%.]], label [[FOR_BODY28_LR_PH_I:%.]]			; CHECK-NEXT: br i1 [[CMP26514_I]], label [[FOR_END33_I:%.]], label [[FOR_BODY28_LR_PH_I:%.]]
	; CHECK: for.body28.lr.ph.i:			; CHECK: for.body28.lr.ph.i:
	; CHECK-NEXT: br label [[FOR_END33_I]]			; CHECK-NEXT: br label [[FOR_END33_I]]
	; CHECK: for.end33.i:			; CHECK: for.end33.i:
	; CHECK-NEXT: br i1 undef, label [[FOR_END58_I:%.]], label [[FOR_BODY52_LR_PH_I:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_END58_I:%.]], label [[FOR_BODY52_LR_PH_I:%.]]
	; CHECK: for.body52.lr.ph.i:			; CHECK: for.body52.lr.ph.i:
	; CHECK-NEXT: br label [[FOR_END58_I]]			; CHECK-NEXT: br label [[FOR_END58_I]]
	; CHECK: for.end58.i:			; CHECK: for.end58.i:
	; CHECK-NEXT: br label [[WHILE_COND260_I:%.*]]			; CHECK-NEXT: br label [[WHILE_COND260_I:%.*]]
	; CHECK: while.cond260.i:			; CHECK: while.cond260.i:
	; CHECK-NEXT: br i1 undef, label [[LAND_RHS263_I:%.]], label [[WHILE_END275_I:%.]]			; CHECK-NEXT: br i1 undef, label [[LAND_RHS263_I:%.]], label [[WHILE_END275_I:%.]]
	; CHECK: land.rhs263.i:			; CHECK: land.rhs263.i:
	; CHECK-NEXT: br i1 undef, label [[WHILE_COND260_I]], label [[WHILE_END275_I]]			; CHECK-NEXT: br i1 undef, label [[WHILE_COND260_I]], label [[WHILE_END275_I]]
	; CHECK: while.end275.i:			; CHECK: while.end275.i:
	; CHECK-NEXT: br label [[EXTEND_BW_EXIT]]			; CHECK-NEXT: br label [[EXTEND_BW_EXIT]]
	; CHECK: extend_bw.exit:			; CHECK: extend_bw.exit:
	; CHECK-NEXT: [[TMP4:%.*]] = phi <2 x i32> [ [[TMP3]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]			; CHECK-NEXT: [[ADD14_I1262:%.*]] = phi i32 [ [[ADD14_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
				; CHECK-NEXT: [[ADD15_I1261:%.*]] = phi i32 [ [[ADD15_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
	; CHECK-NEXT: br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]
	; CHECK: if.then157:			; CHECK: if.then157:
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <2 x i32> [[TMP4]], <i32 1, i32 1>			; CHECK-NEXT: [[ADD158:%.*]] = add nsw i32 [[ADD14_I1262]], 1
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[FROM1115]] to <2 x i32>*			; CHECK-NEXT: store i32 [[ADD158]], i32* [[FROM299]], align 4
	; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4			; CHECK-NEXT: [[ADD160:%.*]] = add nsw i32 [[ADD15_I1261]], 1
				; CHECK-NEXT: store i32 [[ADD160]], i32* [[FROM1115]], align 4
	; CHECK-NEXT: br label [[LAND_LHS_TRUE167]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE167]]
	; CHECK: land.lhs.true167:			; CHECK: land.lhs.true167:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.inc603:			; CHECK: for.inc603:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END605]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY]], label [[FOR_END605]]
	; CHECK: for.end605:			; CHECK: for.end605:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: return:			; CHECK: return:
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 12 Lines

	define i32 @test(double* nocapture %G) {			define i32 @test(double* nocapture %G) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 4.000000e+00, double 3.000000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+00, double 6.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*			; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP3]], 4.000000e+00
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 undef>
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[MUL11]], i32 3
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x double> [[TMP5]], <double 1.000000e+00, double 6.000000e+00, double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[G]] to <4 x double>*
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: store <4 x double> [[TMP6]], <4 x double>* [[TMP7]], align 8
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[MUL11]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], <double 7.000000e+00, double 8.000000e+00>
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX9]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, double* %G, align 8			store double %add, double* %G, align 8
	▲ Show 20 Lines • Show All 309 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extractcost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	define i32 @foo(i32* nocapture %A, i32 %n, i32 %m) {			define i32 @foo(i32* nocapture %A, i32 %n, i32 %m) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[N:%.]], i32 0			; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[N:%.]], 5
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[MUL]], 9
	; CHECK-NEXT: [[TMP1:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], <i32 5, i32 9, i32 3, i32 10>			; CHECK-NEXT: store i32 [[ADD]], i32* [[A:%.*]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = shl <4 x i32> [[SHUFFLE]], <i32 5, i32 9, i32 3, i32 10>			; CHECK-NEXT: [[MUL1:%.*]] = mul nsw i32 [[N]], 9
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[ADD2:%.*]] = add nsw i32 [[MUL1]], 9
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], <i32 9, i32 9, i32 9, i32 9>			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 1
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[A:%.]] to <4 x i32>			; CHECK-NEXT: store i32 [[ADD2]], i32* [[ARRAYIDX3]], align 4
	; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4			; CHECK-NEXT: [[MUL4:%.*]] = shl i32 [[N]], 3
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <4 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[ADD5:%.*]] = add nsw i32 [[MUL4]], 9
	; CHECK-NEXT: [[EXTERNALUSE1:%.]] = add nsw i32 [[TMP6]], [[M:%.]]			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 2
	; CHECK-NEXT: [[EXTERNALUSE2:%.*]] = mul nsw i32 [[TMP6]], [[M]]			; CHECK-NEXT: store i32 [[ADD5]], i32* [[ARRAYIDX6]], align 4
				; CHECK-NEXT: [[MUL7:%.*]] = mul nsw i32 [[N]], 10
				; CHECK-NEXT: [[ADD8:%.*]] = add nsw i32 [[MUL7]], 9
				; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 3
				; CHECK-NEXT: store i32 [[ADD8]], i32* [[ARRAYIDX9]], align 4
				; CHECK-NEXT: [[EXTERNALUSE1:%.]] = add nsw i32 [[ADD]], [[M:%.]]
				; CHECK-NEXT: [[EXTERNALUSE2:%.*]] = mul nsw i32 [[ADD]], [[M]]
	; CHECK-NEXT: [[ADD10:%.*]] = add nsw i32 [[EXTERNALUSE1]], [[EXTERNALUSE2]]			; CHECK-NEXT: [[ADD10:%.*]] = add nsw i32 [[EXTERNALUSE1]], [[EXTERNALUSE2]]
	; CHECK-NEXT: ret i32 [[ADD10]]			; CHECK-NEXT: ret i32 [[ADD10]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %n, 5			%mul = mul nsw i32 %n, 5
	%add = add nsw i32 %mul, 9			%add = add nsw i32 %mul, 9
	store i32 %add, i32* %A, align 4			store i32 %add, i32* %A, align 4
	%mul1 = mul nsw i32 %n, 9			%mul1 = mul nsw i32 %n, 9
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=haswell < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux -mcpu=haswell < %s \| FileCheck %s
	@e = dso_local local_unnamed_addr global i32 0, align 4			@e = dso_local local_unnamed_addr global i32 0, align 4
	@f = dso_local local_unnamed_addr global i32 0, align 4			@f = dso_local local_unnamed_addr global i32 0, align 4

	; Function Attrs: nofree norecurse nounwind uwtable			; Function Attrs: nofree norecurse nounwind uwtable
	define dso_local i32 @g() local_unnamed_addr {			define dso_local i32 @g() local_unnamed_addr {
	; CHECK-LABEL: @g(			; CHECK-LABEL: @g(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @e, align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 @e, align 4
	; CHECK-NEXT: [[TOBOOL_NOT19:%.*]] = icmp eq i32 [[TMP0]], 0			; CHECK-NEXT: [[TOBOOL_NOT19:%.*]] = icmp eq i32 [[TMP0]], 0
	; CHECK-NEXT: br i1 [[TOBOOL_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY:%.]]			; CHECK-NEXT: br i1 [[TOBOOL_NOT19]], label [[WHILE_END:%.]], label [[WHILE_BODY:%.]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[C_022:%.]] = phi i32 [ [[C_022_BE:%.]], [[WHILE_BODY_BACKEDGE:%.]] ], [ undef, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[C_022:%.]] = phi i32 [ [[C_022_BE:%.]], [[WHILE_BODY_BACKEDGE:%.]] ], [ undef, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP14:%.*]], [[WHILE_BODY_BACKEDGE]] ], [ undef, [[ENTRY]] ]			; CHECK-NEXT: [[B_021:%.]] = phi i32 [ [[B_021_BE:%.*]], [[WHILE_BODY_BACKEDGE]] ], [ undef, [[ENTRY]] ]
				; CHECK-NEXT: [[A_020:%.]] = phi i32 [ [[A_020_BE:%.*]], [[WHILE_BODY_BACKEDGE]] ], [ undef, [[ENTRY]] ]
	; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 1			; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 1
	; CHECK-NEXT: [[TMP2:%.]] = ptrtoint i32 [[C_022]] to i64			; CHECK-NEXT: [[TMP1:%.]] = ptrtoint i32 [[C_022]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = trunc i64 [[TMP2]] to i32			; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[TMP1]] to i32
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 1, i64 1>			; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[A_020]], i64 1
	; CHECK-NEXT: switch i32 [[TMP3]], label [[WHILE_BODY_BACKEDGE]] [			; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[B_021]], i64 1
				; CHECK-NEXT: switch i32 [[TMP2]], label [[WHILE_BODY_BACKEDGE]] [
	; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]			; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]
	; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]			; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[B_021]], i64 2
	; CHECK-NEXT: [[TMP6:%.]] = ptrtoint i32 [[TMP5]] to i64			; CHECK-NEXT: [[TMP3:%.]] = ptrtoint i32 [[INCDEC_PTR2]] to i64
	; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[TMP3]] to i32
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[A_020]], i64 2
	; CHECK-NEXT: [[TMP9:%.]] = extractelement <2 x i32> [[TMP4]], i32 1			; CHECK-NEXT: store i32 [[TMP4]], i32* [[INCDEC_PTR1]], align 4
	; CHECK-NEXT: store i32 [[TMP7]], i32* [[TMP9]], align 4
	; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2			; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: sw.bb6:			; CHECK: sw.bb6:
				; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[A_020]], i64 2
	; CHECK-NEXT: [[INCDEC_PTR8:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2			; CHECK-NEXT: [[INCDEC_PTR8:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2
	; CHECK-NEXT: [[TMP10:%.]] = ptrtoint i32 [[INCDEC_PTR]] to i64			; CHECK-NEXT: [[TMP5:%.]] = ptrtoint i32 [[INCDEC_PTR]] to i64
	; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP10]] to i32			; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[TMP5]] to i32
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[INCDEC_PTR9:%.]] = getelementptr inbounds i32, i32 [[B_021]], i64 2
	; CHECK-NEXT: [[TMP13:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: store i32 [[TMP6]], i32* [[INCDEC_PTR2]], align 4
	; CHECK-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: while.body.backedge:			; CHECK: while.body.backedge:
	; CHECK-NEXT: [[C_022_BE]] = phi i32* [ [[INCDEC_PTR]], [[WHILE_BODY]] ], [ [[INCDEC_PTR8]], [[SW_BB6]] ], [ [[INCDEC_PTR5]], [[SW_BB]] ]			; CHECK-NEXT: [[C_022_BE]] = phi i32* [ [[INCDEC_PTR]], [[WHILE_BODY]] ], [ [[INCDEC_PTR8]], [[SW_BB6]] ], [ [[INCDEC_PTR5]], [[SW_BB]] ]
	; CHECK-NEXT: [[TMP14]] = phi <2 x i32*> [ [[TMP4]], [[WHILE_BODY]] ], [ [[TMP12]], [[SW_BB6]] ], [ [[TMP8]], [[SW_BB]] ]			; CHECK-NEXT: [[B_021_BE]] = phi i32* [ [[INCDEC_PTR2]], [[WHILE_BODY]] ], [ [[INCDEC_PTR9]], [[SW_BB6]] ], [ [[INCDEC_PTR3]], [[SW_BB]] ]
				; CHECK-NEXT: [[A_020_BE]] = phi i32* [ [[INCDEC_PTR1]], [[WHILE_BODY]] ], [ [[INCDEC_PTR7]], [[SW_BB6]] ], [ [[INCDEC_PTR4]], [[SW_BB]] ]
	; CHECK-NEXT: br label [[WHILE_BODY]]			; CHECK-NEXT: br label [[WHILE_BODY]]
	; CHECK: while.end:			; CHECK: while.end:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* @e, align 4			%0 = load i32, i32* @e, align 4
	%tobool.not19 = icmp eq i32 %0, 0			%tobool.not19 = icmp eq i32 %0, 0
	br i1 %tobool.not19, label %while.end, label %while.body			br i1 %tobool.not19, label %while.end, label %while.body
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

Show First 20 Lines • Show All 1,405 Lines • ▼ Show 20 Lines	;
%mh = tail call i8 @llvm.umin.i8(i8 %mfeba, i8 %mdc98)		%mh = tail call i8 @llvm.umin.i8(i8 %mfeba, i8 %mdc98)
%m = tail call i8 @llvm.umin.i8(i8 %mh, i8 %ml)		%m = tail call i8 @llvm.umin.i8(i8 %mh, i8 %ml)
ret i8 %m		ret i8 %m
}		}

; This should not crash.		; This should not crash.

define void @PR49730() {		define void @PR49730() {
; SSE-LABEL: @PR49730(		; CHECK-LABEL: @PR49730(
; SSE-NEXT: [[T:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 2)		; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
; SSE-NEXT: [[T1:%.*]] = sub nsw i32 undef, [[T]]		; CHECK-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
; SSE-NEXT: [[T2:%.*]] = call i32 @llvm.umin.i32(i32 undef, i32 [[T1]])		; CHECK-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; SSE-NEXT: [[T3:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 2)		; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
; SSE-NEXT: [[T4:%.*]] = sub nsw i32 undef, [[T3]]		; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
; SSE-NEXT: [[T5:%.*]] = call i32 @llvm.umin.i32(i32 [[T2]], i32 [[T4]])		; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
; SSE-NEXT: [[T6:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 1)		; CHECK-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
; SSE-NEXT: [[T7:%.*]] = sub nuw nsw i32 undef, [[T6]]		; CHECK-NEXT: ret void
; SSE-NEXT: [[T8:%.*]] = call i32 @llvm.umin.i32(i32 [[T5]], i32 [[T7]])
; SSE-NEXT: [[T9:%.*]] = call i32 @llvm.smin.i32(i32 undef, i32 1)
; SSE-NEXT: [[T10:%.*]] = sub nsw i32 undef, [[T9]]
; SSE-NEXT: [[T11:%.*]] = call i32 @llvm.umin.i32(i32 [[T8]], i32 [[T10]])
; SSE-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; SSE-NEXT: [[T13:%.*]] = call i32 @llvm.umin.i32(i32 [[T11]], i32 [[T12]])
; SSE-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[T13]], i32 93)
; SSE-NEXT: ret void
;
; AVX-LABEL: @PR49730(
; AVX-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
; AVX-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
; AVX-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; AVX-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
; AVX-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
; AVX-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
; AVX-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
; AVX-NEXT: ret void
;
; AVX2-LABEL: @PR49730(
; AVX2-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
; AVX2-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
; AVX2-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; AVX2-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
; AVX2-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
; AVX2-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
; AVX2-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
; AVX2-NEXT: ret void
;
; THRESH-LABEL: @PR49730(
; THRESH-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 1, i32 1>)
; THRESH-NEXT: [[TMP2:%.*]] = sub nsw <4 x i32> undef, [[TMP1]]
; THRESH-NEXT: [[T12:%.*]] = sub nsw i32 undef, undef
; THRESH-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[TMP2]])
; THRESH-NEXT: [[TMP4:%.*]] = call i32 @llvm.umin.i32(i32 [[T12]], i32 undef)
; THRESH-NEXT: [[TMP5:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP3]], i32 [[TMP4]])
; THRESH-NEXT: [[T14:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP5]], i32 93)
; THRESH-NEXT: ret void
;		;
%t = call i32 @llvm.smin.i32(i32 undef, i32 2)		%t = call i32 @llvm.smin.i32(i32 undef, i32 2)
%t1 = sub nsw i32 undef, %t		%t1 = sub nsw i32 undef, %t
%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)		%t2 = call i32 @llvm.umin.i32(i32 undef, i32 %t1)
%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)		%t3 = call i32 @llvm.smin.i32(i32 undef, i32 2)
%t4 = sub nsw i32 undef, %t3		%t4 = sub nsw i32 undef, %t3
%t5 = call i32 @llvm.umin.i32(i32 %t2, i32 %t4)		%t5 = call i32 @llvm.umin.i32(i32 %t2, i32 %t4)
%t6 = call i32 @llvm.smin.i32(i32 undef, i32 1)		%t6 = call i32 @llvm.smin.i32(i32 undef, i32 1)
Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll

	Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines
	; ALL-NEXT: [[CMP1495:%.]] = icmp eq i32 [[ARG_B:%.]], 0			; ALL-NEXT: [[CMP1495:%.]] = icmp eq i32 [[ARG_B:%.]], 0
	; ALL-NEXT: br label [[FOR_BODY:%.*]]			; ALL-NEXT: br label [[FOR_BODY:%.*]]
	; ALL: for.cond.cleanup:			; ALL: for.cond.cleanup:
	; ALL-NEXT: ret void			; ALL-NEXT: ret void
	; ALL: for.body:			; ALL: for.body:
	; ALL-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND_CLEANUP15:%.]] ]			; ALL-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND_CLEANUP15:%.]] ]
	; ALL-NEXT: [[TMP0:%.*]] = shl i64 [[INDVARS_IV]], 2			; ALL-NEXT: [[TMP0:%.*]] = shl i64 [[INDVARS_IV]], 2
	; ALL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[ARRAY:%.*]], i64 [[TMP0]]			; ALL-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[ARRAY:%.*]], i64 [[TMP0]]
	; ALL-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; ALL-NEXT: [[TMP1:%.*]] = or i64 [[TMP0]], 1
	; ALL-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 1			; ALL-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP1]]
	; ALL-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP2]]			; ALL-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 2
	; ALL-NEXT: [[TMP3:%.]] = load float, float [[ARRAYIDX4]], align 4			; ALL-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP2]]
	; ALL-NEXT: [[TMP4:%.*]] = or i64 [[TMP0]], 2			; ALL-NEXT: [[TMP3:%.*]] = or i64 [[TMP0]], 3
	; ALL-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP4]]			; ALL-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP3]]
	; ALL-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX8]], align 4			; ALL-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; ALL-NEXT: [[TMP6:%.*]] = or i64 [[TMP0]], 3			; ALL-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
	; ALL-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP6]]			; ALL-NEXT: [[TMP6:%.*]] = extractelement <4 x float> [[TMP5]], i32 0
	; ALL-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX12]], align 4			; ALL-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP5]], i32 1
				; ALL-NEXT: [[TMP8:%.*]] = extractelement <4 x float> [[TMP5]], i32 2
				; ALL-NEXT: [[TMP9:%.*]] = extractelement <4 x float> [[TMP5]], i32 3
	; ALL-NEXT: br i1 [[CMP1495]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16_LR_PH:%.*]]			; ALL-NEXT: br i1 [[CMP1495]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16_LR_PH:%.*]]
	; ALL: for.body16.lr.ph:			; ALL: for.body16.lr.ph:
	; ALL-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[ARG_A:%.*]], i64 [[INDVARS_IV]]			; ALL-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[ARG_A:%.*]], i64 [[INDVARS_IV]]
	; ALL-NEXT: [[TMP8:%.]] = load float, float [[ADD_PTR]], align 4			; ALL-NEXT: [[TMP10:%.]] = load float, float [[ADD_PTR]], align 4
	; ALL-NEXT: br label [[FOR_BODY16:%.*]]			; ALL-NEXT: br label [[FOR_BODY16:%.*]]
	; ALL: for.cond.cleanup15:			; ALL: for.cond.cleanup15:
	; ALL-NEXT: [[W2_0_LCSSA:%.]] = phi float [ [[TMP5]], [[FOR_BODY]] ], [ [[SUB28:%.]], [[FOR_BODY16]] ]			; ALL-NEXT: [[W2_0_LCSSA:%.]] = phi float [ [[TMP8]], [[FOR_BODY]] ], [ [[OP_RDX:%.]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[W3_0_LCSSA:%.]] = phi float [ [[TMP7]], [[FOR_BODY]] ], [ [[W2_096:%.]], [[FOR_BODY16]] ]			; ALL-NEXT: [[W3_0_LCSSA:%.]] = phi float [ [[TMP9]], [[FOR_BODY]] ], [ [[TMP24:%.]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[W1_0_LCSSA:%.]] = phi float [ [[TMP3]], [[FOR_BODY]] ], [ [[W0_0100:%.]], [[FOR_BODY16]] ]			; ALL-NEXT: [[W1_0_LCSSA:%.]] = phi float [ [[TMP7]], [[FOR_BODY]] ], [ [[TMP12:%.]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[W0_0_LCSSA:%.]] = phi float [ [[TMP1]], [[FOR_BODY]] ], [ [[SUB19:%.]], [[FOR_BODY16]] ]			; ALL-NEXT: [[W0_0_LCSSA:%.]] = phi float [ [[TMP6]], [[FOR_BODY]] ], [ [[SUB19:%.]], [[FOR_BODY16]] ]
	; ALL-NEXT: store float [[W0_0_LCSSA]], float* [[ARRAYIDX]], align 4			; ALL-NEXT: store float [[W0_0_LCSSA]], float* [[ARRAYIDX]], align 4
	; ALL-NEXT: store float [[W1_0_LCSSA]], float* [[ARRAYIDX4]], align 4			; ALL-NEXT: store float [[W1_0_LCSSA]], float* [[ARRAYIDX4]], align 4
	; ALL-NEXT: store float [[W2_0_LCSSA]], float* [[ARRAYIDX8]], align 4			; ALL-NEXT: store float [[W2_0_LCSSA]], float* [[ARRAYIDX8]], align 4
	; ALL-NEXT: store float [[W3_0_LCSSA]], float* [[ARRAYIDX12]], align 4			; ALL-NEXT: store float [[W3_0_LCSSA]], float* [[ARRAYIDX12]], align 4
	; ALL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; ALL-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; ALL-NEXT: [[EXITCOND109:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 6			; ALL-NEXT: [[EXITCOND109:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 6
	; ALL-NEXT: br i1 [[EXITCOND109]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; ALL-NEXT: br i1 [[EXITCOND109]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	; ALL: for.body16:			; ALL: for.body16:
	; ALL-NEXT: [[W0_0100]] = phi float [ [[TMP1]], [[FOR_BODY16_LR_PH]] ], [ [[SUB19]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[W1_099:%.*]] = phi float [ [[TMP3]], [[FOR_BODY16_LR_PH]] ], [ [[W0_0100]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[J_098:%.]] = phi i32 [ 0, [[FOR_BODY16_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY16]] ]			; ALL-NEXT: [[J_098:%.]] = phi i32 [ 0, [[FOR_BODY16_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[W3_097:%.*]] = phi float [ [[TMP7]], [[FOR_BODY16_LR_PH]] ], [ [[W2_096]], [[FOR_BODY16]] ]			; ALL-NEXT: [[TMP11:%.]] = phi <4 x float> [ [[TMP5]], [[FOR_BODY16_LR_PH]] ], [ [[TMP23:%.]], [[FOR_BODY16]] ]
	; ALL-NEXT: [[W2_096]] = phi float [ [[TMP5]], [[FOR_BODY16_LR_PH]] ], [ [[SUB28]], [[FOR_BODY16]] ]			; ALL-NEXT: [[TMP12]] = extractelement <4 x float> [[TMP11]], i32 0
	; ALL-NEXT: [[MUL17:%.*]] = fmul fast float [[W0_0100]], 0x3FF19999A0000000			; ALL-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP11]], i32 1
	; ALL-NEXT: [[MUL18_NEG:%.*]] = fmul fast float [[W1_099]], 0xBFF3333340000000			; ALL-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP12]], i32 0
	; ALL-NEXT: [[SUB92:%.*]] = fadd fast float [[MUL17]], [[MUL18_NEG]]			; ALL-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[TMP13]], i32 1
	; ALL-NEXT: [[SUB19]] = fadd fast float [[SUB92]], [[TMP8]]			; ALL-NEXT: [[TMP16:%.*]] = fmul fast <2 x float> [[TMP15]], <float 0x3FF19999A0000000, float 0xBFF3333340000000>
				; ALL-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP16]], i32 0
				; ALL-NEXT: [[TMP18:%.*]] = extractelement <2 x float> [[TMP16]], i32 1
				; ALL-NEXT: [[SUB92:%.*]] = fadd fast float [[TMP17]], [[TMP18]]
				; ALL-NEXT: [[SUB19]] = fadd fast float [[SUB92]], [[TMP10]]
	; ALL-NEXT: [[MUL20:%.*]] = fmul fast float [[SUB19]], 0x4000CCCCC0000000			; ALL-NEXT: [[MUL20:%.*]] = fmul fast float [[SUB19]], 0x4000CCCCC0000000
	; ALL-NEXT: [[MUL21_NEG:%.*]] = fmul fast float [[W0_0100]], 0xC0019999A0000000			; ALL-NEXT: [[TMP19:%.*]] = fmul fast <4 x float> [[TMP11]], <float 0xC0019999A0000000, float 0x4002666660000000, float 0x4008CCCCC0000000, float 0xC0099999A0000000>
	; ALL-NEXT: [[MUL23:%.*]] = fmul fast float [[W1_099]], 0x4002666660000000			; ALL-NEXT: [[TMP20:%.*]] = call fast float @llvm.vector.reduce.fadd.v4f32(float -0.000000e+00, <4 x float> [[TMP19]])
	; ALL-NEXT: [[MUL25:%.*]] = fmul fast float [[W2_096]], 0x4008CCCCC0000000			; ALL-NEXT: [[OP_RDX]] = fadd fast float [[TMP20]], [[MUL20]]
	; ALL-NEXT: [[MUL27_NEG:%.*]] = fmul fast float [[W3_097]], 0xC0099999A0000000
	; ALL-NEXT: [[ADD2293:%.*]] = fadd fast float [[MUL27_NEG]], [[MUL25]]
	; ALL-NEXT: [[ADD24:%.*]] = fadd fast float [[ADD2293]], [[MUL23]]
	; ALL-NEXT: [[SUB2694:%.*]] = fadd fast float [[ADD24]], [[MUL21_NEG]]
	; ALL-NEXT: [[SUB28]] = fadd fast float [[SUB2694]], [[MUL20]]
	; ALL-NEXT: [[INC]] = add nuw i32 [[J_098]], 1			; ALL-NEXT: [[INC]] = add nuw i32 [[J_098]], 1
	; ALL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ARG_B]]			; ALL-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ARG_B]]
				; ALL-NEXT: [[TMP21:%.*]] = insertelement <4 x float> poison, float [[SUB19]], i32 0
				; ALL-NEXT: [[TMP22:%.*]] = shufflevector <4 x float> [[TMP21]], <4 x float> [[TMP11]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
				; ALL-NEXT: [[TMP23]] = insertelement <4 x float> [[TMP22]], float [[OP_RDX]], i32 2
				; ALL-NEXT: [[TMP24]] = extractelement <4 x float> [[TMP11]], i32 2
	; ALL-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16]]			; ALL-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16]]
	;			;
	entry:			entry:
	%cmp1495 = icmp eq i32 %arg_B, 0			%cmp1495 = icmp eq i32 %arg_B, 0
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.cond.cleanup15			for.cond.cleanup: ; preds = %for.cond.cleanup15
	ret void			ret void
	▲ Show 20 Lines • Show All 995 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

Show First 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	;
%sidx1 = getelementptr inbounds double, double* %storeArray, i64 1		%sidx1 = getelementptr inbounds double, double* %storeArray, i64 1
store double %add0, double *%sidx0, align 8		store double %add0, double *%sidx0, align 8
store double %add1, double *%sidx1, align 8		store double %add1, double *%sidx1, align 8
ret void		ret void
}		}


define i1 @ExtractIdxNotConstantInt1(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {		define i1 @ExtractIdxNotConstantInt1(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {
; SSE-LABEL: @ExtractIdxNotConstantInt1(		; CHECK-LABEL: @ExtractIdxNotConstantInt1(
; SSE-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 undef		; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 undef
; SSE-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]		; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; SSE-NEXT: [[VECEXT_I276_I169:%.]] = extractelement <4 x float> [[VEC]], i64 [[IDX2:%.]]		; CHECK-NEXT: [[VECEXT_I276_I169:%.]] = extractelement <4 x float> [[VEC]], i64 [[IDX2:%.]]
; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1
; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; SSE-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0		; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0
; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]
; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>		; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>
; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; SSE-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]		; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]
; SSE-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00		; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; SSE-NEXT: ret i1 [[CMP_I185]]		; CHECK-NEXT: ret i1 [[CMP_I185]]
;
; AVX-LABEL: @ExtractIdxNotConstantInt1(
; AVX-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 undef
; AVX-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; AVX-NEXT: [[FM:%.]] = fmul float [[A:%.]], [[SUB14_I167]]
; AVX-NEXT: [[SUB25_I168:%.]] = fsub float [[FM]], [[B:%.]]
; AVX-NEXT: [[VECEXT_I276_I169:%.]] = extractelement <4 x float> [[VEC]], i64 [[IDX2:%.]]
; AVX-NEXT: [[ADD36_I173:%.*]] = fadd float [[SUB25_I168]], 1.000000e+01
; AVX-NEXT: [[MUL72_I179:%.]] = fmul float [[C:%.]], [[VECEXT_I276_I169]]
; AVX-NEXT: [[ADD78_I180:%.*]] = fsub float [[MUL72_I179]], 3.000000e+01
; AVX-NEXT: [[ADD79_I181:%.*]] = fadd float 2.000000e+00, [[ADD78_I180]]
; AVX-NEXT: [[MUL123_I184:%.*]] = fmul float [[ADD36_I173]], [[ADD79_I181]]
; AVX-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; AVX-NEXT: ret i1 [[CMP_I185]]
;		;
%vecext.i291.i166 = extractelement <4 x float> %vec, i64 undef		%vecext.i291.i166 = extractelement <4 x float> %vec, i64 undef
%sub14.i167 = fsub float undef, %vecext.i291.i166		%sub14.i167 = fsub float undef, %vecext.i291.i166
%fm = fmul float %a, %sub14.i167		%fm = fmul float %a, %sub14.i167
%sub25.i168 = fsub float %fm, %b		%sub25.i168 = fsub float %fm, %b
%vecext.i276.i169 = extractelement <4 x float> %vec, i64 %idx2		%vecext.i276.i169 = extractelement <4 x float> %vec, i64 %idx2
%add36.i173 = fadd float %sub25.i168, 10.0		%add36.i173 = fadd float %sub25.i168, 10.0
%mul72.i179 = fmul float %c, %vecext.i276.i169		%mul72.i179 = fmul float %c, %vecext.i276.i169
%add78.i180 = fsub float %mul72.i179, 30.0		%add78.i180 = fsub float %mul72.i179, 30.0
%add79.i181 = fadd float 2.0, %add78.i180		%add79.i181 = fadd float 2.0, %add78.i180
%mul123.i184 = fmul float %add36.i173, %add79.i181		%mul123.i184 = fmul float %add36.i173, %add79.i181
%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00		%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}


define i1 @ExtractIdxNotConstantInt2(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {		define i1 @ExtractIdxNotConstantInt2(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {
; SSE-LABEL: @ExtractIdxNotConstantInt2(		; CHECK-LABEL: @ExtractIdxNotConstantInt2(
; SSE-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 1		; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 1
; SSE-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]		; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; SSE-NEXT: [[VECEXT_I276_I169:%.]] = extractelement <4 x float> [[VEC]], i64 [[IDX2:%.]]		; CHECK-NEXT: [[VECEXT_I276_I169:%.]] = extractelement <4 x float> [[VEC]], i64 [[IDX2:%.]]
; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1
; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; SSE-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0		; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0
; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]
; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>		; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>
; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; SSE-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]		; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]
; SSE-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00		; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; SSE-NEXT: ret i1 [[CMP_I185]]		; CHECK-NEXT: ret i1 [[CMP_I185]]
;
; AVX-LABEL: @ExtractIdxNotConstantInt2(
; AVX-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 1
; AVX-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; AVX-NEXT: [[FM:%.]] = fmul float [[A:%.]], [[SUB14_I167]]
; AVX-NEXT: [[SUB25_I168:%.]] = fsub float [[FM]], [[B:%.]]
; AVX-NEXT: [[VECEXT_I276_I169:%.]] = extractelement <4 x float> [[VEC]], i64 [[IDX2:%.]]
; AVX-NEXT: [[ADD36_I173:%.*]] = fadd float [[SUB25_I168]], 1.000000e+01
; AVX-NEXT: [[MUL72_I179:%.]] = fmul float [[C:%.]], [[VECEXT_I276_I169]]
; AVX-NEXT: [[ADD78_I180:%.*]] = fsub float [[MUL72_I179]], 3.000000e+01
; AVX-NEXT: [[ADD79_I181:%.*]] = fadd float 2.000000e+00, [[ADD78_I180]]
; AVX-NEXT: [[MUL123_I184:%.*]] = fmul float [[ADD36_I173]], [[ADD79_I181]]
; AVX-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; AVX-NEXT: ret i1 [[CMP_I185]]
;		;
%vecext.i291.i166 = extractelement <4 x float> %vec, i64 1		%vecext.i291.i166 = extractelement <4 x float> %vec, i64 1
%sub14.i167 = fsub float undef, %vecext.i291.i166		%sub14.i167 = fsub float undef, %vecext.i291.i166
%fm = fmul float %a, %sub14.i167		%fm = fmul float %a, %sub14.i167
%sub25.i168 = fsub float %fm, %b		%sub25.i168 = fsub float %fm, %b
%vecext.i276.i169 = extractelement <4 x float> %vec, i64 %idx2		%vecext.i276.i169 = extractelement <4 x float> %vec, i64 %idx2
%add36.i173 = fadd float %sub25.i168, 10.0		%add36.i173 = fadd float %sub25.i168, 10.0
%mul72.i179 = fmul float %c, %vecext.i276.i169		%mul72.i179 = fmul float %c, %vecext.i276.i169
%add78.i180 = fsub float %mul72.i179, 30.0		%add78.i180 = fsub float %mul72.i179, 30.0
%add79.i181 = fadd float 2.0, %add78.i180		%add79.i181 = fadd float 2.0, %add78.i180
%mul123.i184 = fmul float %add36.i173, %add79.i181		%mul123.i184 = fmul float %add36.i173, %add79.i181
%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00		%cmp.i185 = fcmp ogt float %mul123.i184, 0.000000e+00
ret i1 %cmp.i185		ret i1 %cmp.i185
}		}


define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {		define i1 @foo(float %a, float %b, float %c, <4 x float> %vec, i64 %idx2) {
; SSE-LABEL: @foo(		; CHECK-LABEL: @foo(
; SSE-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0		; CHECK-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; SSE-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]		; CHECK-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; SSE-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1		; CHECK-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1
; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x float> [[TMP1]], float [[C:%.]], i32 1
; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[SUB14_I167]], i32 0
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[VECEXT_I276_I169]], i32 1
; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP2]], [[TMP4]]
; SSE-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0		; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x float> <float poison, float 3.000000e+01>, float [[B:%.]], i32 0
; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]		; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x float> [[TMP5]], [[TMP6]]
; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>		; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], <float 1.000000e+01, float 2.000000e+00>
; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1		; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
; SSE-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]		; CHECK-NEXT: [[MUL123_I184:%.*]] = fmul float [[TMP9]], [[TMP10]]
; SSE-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00		; CHECK-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; SSE-NEXT: ret i1 [[CMP_I185]]		; CHECK-NEXT: ret i1 [[CMP_I185]]
;
; AVX-LABEL: @foo(
; AVX-NEXT: [[VECEXT_I291_I166:%.]] = extractelement <4 x float> [[VEC:%.]], i64 0
; AVX-NEXT: [[SUB14_I167:%.*]] = fsub float undef, [[VECEXT_I291_I166]]
; AVX-NEXT: [[FM:%.]] = fmul float [[A:%.]], [[SUB14_I167]]
; AVX-NEXT: [[SUB25_I168:%.]] = fsub float [[FM]], [[B:%.]]
; AVX-NEXT: [[VECEXT_I276_I169:%.*]] = extractelement <4 x float> [[VEC]], i64 1
; AVX-NEXT: [[ADD36_I173:%.*]] = fadd float [[SUB25_I168]], 1.000000e+01
; AVX-NEXT: [[MUL72_I179:%.]] = fmul float [[C:%.]], [[VECEXT_I276_I169]]
; AVX-NEXT: [[ADD78_I180:%.*]] = fsub float [[MUL72_I179]], 3.000000e+01
; AVX-NEXT: [[ADD79_I181:%.*]] = fadd float 2.000000e+00, [[ADD78_I180]]
; AVX-NEXT: [[MUL123_I184:%.*]] = fmul float [[ADD36_I173]], [[ADD79_I181]]
; AVX-NEXT: [[CMP_I185:%.*]] = fcmp ogt float [[MUL123_I184]], 0.000000e+00
; AVX-NEXT: ret i1 [[CMP_I185]]
;		;
%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0		%vecext.i291.i166 = extractelement <4 x float> %vec, i64 0
%sub14.i167 = fsub float undef, %vecext.i291.i166		%sub14.i167 = fsub float undef, %vecext.i291.i166
%fm = fmul float %a, %sub14.i167		%fm = fmul float %a, %sub14.i167
%sub25.i168 = fsub float %fm, %b		%sub25.i168 = fsub float %fm, %b
%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1		%vecext.i276.i169 = extractelement <4 x float> %vec, i64 1
%add36.i173 = fadd float %sub25.i168, 10.0		%add36.i173 = fadd float %sub25.i168, 10.0
%mul72.i179 = fmul float %c, %vecext.i276.i169		%mul72.i179 = fmul float %c, %vecext.i276.i169
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+sse2 -S \| FileCheck %s --check-prefixes=CHECK,SSE			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+sse2 -S \| FileCheck %s --check-prefixes=SSE
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx -S \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx -S \| FileCheck %s --check-prefixes=AVX
	; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx2 -S \| FileCheck %s --check-prefixes=CHECK,AVX			; RUN: opt < %s -slp-threshold=-6 -slp-vectorizer -instcombine -mattr=+avx2 -S \| FileCheck %s --check-prefixes=AVX

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; These tests ensure that we do not regress due to PR31243. Note that we set			; These tests ensure that we do not regress due to PR31243. Note that we set
	; the SLP threshold to force vectorization even when not profitable.			; the SLP threshold to force vectorization even when not profitable.

	; When computing minimum sizes, if we can prove the sign bit is zero, we can			; When computing minimum sizes, if we can prove the sign bit is zero, we can
	; zero-extend the roots back to their original sizes.			; zero-extend the roots back to their original sizes.
	;			;
	define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {			define i8 @PR31243_zext(i8 %v0, i8 %v1, i8 %v2, i8 %v3, i8* %ptr) {
	; CHECK-LABEL: @PR31243_zext(			; SSE-LABEL: @PR31243_zext(
	; CHECK-NEXT: entry:			; SSE-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i64 0			; SSE-NEXT: [[TMP0:%.]] = or i8 [[V0:%.]], 1
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i64 1			; SSE-NEXT: [[TMP1:%.]] = or i8 [[V1:%.]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>			; SSE-NEXT: [[TMP2:%.*]] = zext i8 [[TMP0]] to i64
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i64 0			; SSE-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i64			; SSE-NEXT: [[TMP3:%.*]] = zext i8 [[TMP1]] to i64
	; CHECK-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP4]]			; SSE-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i8> [[TMP2]], i64 1			; SSE-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1
	; CHECK-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i64			; SSE-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1
	; CHECK-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP6]]			; SSE-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]
	; CHECK-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1			; SSE-NEXT: ret i8 [[TMP_8]]
	; CHECK-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1			;
	; CHECK-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]			; AVX-LABEL: @PR31243_zext(
	; CHECK-NEXT: ret i8 [[TMP_8]]			; AVX-NEXT: entry:
				; AVX-NEXT: [[TMP0:%.]] = insertelement <2 x i8> poison, i8 [[V0:%.]], i64 0
				; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8 [[V1:%.]], i64 1
				; AVX-NEXT: [[TMP2:%.*]] = or <2 x i8> [[TMP1]], <i8 1, i8 1>
				; AVX-NEXT: [[TMP3:%.*]] = extractelement <2 x i8> [[TMP2]], i64 0
				; AVX-NEXT: [[TMP4:%.*]] = zext i8 [[TMP3]] to i64
				; AVX-NEXT: [[TMP_4:%.]] = getelementptr inbounds i8, i8 [[PTR:%.*]], i64 [[TMP4]]
				; AVX-NEXT: [[TMP5:%.*]] = extractelement <2 x i8> [[TMP2]], i64 1
				; AVX-NEXT: [[TMP6:%.*]] = zext i8 [[TMP5]] to i64
				; AVX-NEXT: [[TMP_5:%.]] = getelementptr inbounds i8, i8 [[PTR]], i64 [[TMP6]]
				; AVX-NEXT: [[TMP_6:%.]] = load i8, i8 [[TMP_4]], align 1
				; AVX-NEXT: [[TMP_7:%.]] = load i8, i8 [[TMP_5]], align 1
				; AVX-NEXT: [[TMP_8:%.*]] = add i8 [[TMP_6]], [[TMP_7]]
				; AVX-NEXT: ret i8 [[TMP_8]]
	;			;
	entry:			entry:
	%tmp_0 = zext i8 %v0 to i32			%tmp_0 = zext i8 %v0 to i32
	%tmp_1 = zext i8 %v1 to i32			%tmp_1 = zext i8 %v1 to i32
	%tmp_2 = or i32 %tmp_0, 1			%tmp_2 = or i32 %tmp_0, 1
	%tmp_3 = or i32 %tmp_1, 1			%tmp_3 = or i32 %tmp_1, 1
	%tmp_4 = getelementptr inbounds i8, i8* %ptr, i32 %tmp_2			%tmp_4 = getelementptr inbounds i8, i8* %ptr, i32 %tmp_2
	%tmp_5 = getelementptr inbounds i8, i8* %ptr, i32 %tmp_3			%tmp_5 = getelementptr inbounds i8, i8* %ptr, i32 %tmp_3
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	for.body.lr.ph.i:
ret void		ret void
}		}

; Function Attrs: norecurse nounwind uwtable		; Function Attrs: norecurse nounwind uwtable
define void @pr35497() local_unnamed_addr #0 {		define void @pr35497() local_unnamed_addr #0 {
; SSE-LABEL: @pr35497(		; SSE-LABEL: @pr35497(
; SSE-NEXT: entry:		; SSE-NEXT: entry:
; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1		; SSE-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
		; SSE-NEXT: [[AND:%.*]] = shl i64 [[TMP0]], 2
		; SSE-NEXT: [[SHL:%.*]] = and i64 [[AND]], 20
; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef		; SSE-NEXT: [[ADD:%.*]] = add i64 undef, undef
; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1		; SSE-NEXT: store i64 [[ADD]], i64* undef, align 1
		; SSE-NEXT: [[ARRAYIDX2_1:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 5
		; SSE-NEXT: [[AND_1:%.*]] = shl i64 undef, 2
		; SSE-NEXT: [[SHL_1:%.*]] = and i64 [[AND_1]], 20
		; SSE-NEXT: [[SHR_1:%.*]] = lshr i64 undef, 6
		; SSE-NEXT: [[ADD_1:%.*]] = add nuw nsw i64 [[SHL]], [[SHR_1]]
; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4		; SSE-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
		; SSE-NEXT: [[SHR_2:%.*]] = lshr i64 undef, 6
		; SSE-NEXT: [[ADD_2:%.*]] = add nuw nsw i64 [[SHL_1]], [[SHR_2]]
		; SSE-NEXT: [[AND_4:%.*]] = shl i64 [[ADD]], 2
		; SSE-NEXT: [[SHL_4:%.*]] = and i64 [[AND_4]], 20
		; SSE-NEXT: [[ARRAYIDX2_5:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 1
		; SSE-NEXT: store i64 [[ADD_1]], i64* [[ARRAYIDX2_5]], align 1
		; SSE-NEXT: [[AND_5:%.*]] = shl nuw nsw i64 [[ADD_1]], 2
		; SSE-NEXT: [[SHL_5:%.*]] = and i64 [[AND_5]], 20
		; SSE-NEXT: [[SHR_5:%.*]] = lshr i64 [[ADD_1]], 6
		; SSE-NEXT: [[ADD_5:%.*]] = add nuw nsw i64 [[SHL_4]], [[SHR_5]]
		; SSE-NEXT: store i64 [[ADD_5]], i64* [[ARRAYIDX2_1]], align 1
; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0		; SSE-NEXT: [[ARRAYIDX2_6:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 0
; SSE-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> <i64 undef, i64 poison>, i64 [[TMP0]], i32 1		; SSE-NEXT: store i64 [[ADD_2]], i64* [[ARRAYIDX2_6]], align 1
; SSE-NEXT: [[TMP2:%.*]] = shl <2 x i64> [[TMP1]], <i64 2, i64 2>		; SSE-NEXT: [[SHR_6:%.*]] = lshr i64 [[ADD_2]], 6
; SSE-NEXT: [[TMP3:%.*]] = and <2 x i64> [[TMP2]], <i64 20, i64 20>		; SSE-NEXT: [[ADD_6:%.*]] = add nuw nsw i64 [[SHL_5]], [[SHR_6]]
; SSE-NEXT: [[TMP4:%.*]] = add nuw nsw <2 x i64> [[TMP3]], zeroinitializer		; SSE-NEXT: store i64 [[ADD_6]], i64* [[ARRAYIDX2_2]], align 1
; SSE-NEXT: [[TMP5:%.*]] = extractelement <2 x i64> [[TMP4]], i32 1
; SSE-NEXT: [[TMP6:%.]] = bitcast i64 [[ARRAYIDX2_6]] to <2 x i64>*
; SSE-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 1
; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x i64> poison, i64 [[TMP5]], i32 0
; SSE-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> [[TMP7]], i64 [[ADD]], i32 1
; SSE-NEXT: [[TMP9:%.*]] = shl <2 x i64> [[TMP8]], <i64 2, i64 2>
; SSE-NEXT: [[TMP10:%.*]] = and <2 x i64> [[TMP9]], <i64 20, i64 20>
; SSE-NEXT: [[TMP11:%.*]] = lshr <2 x i64> [[TMP4]], <i64 6, i64 6>
; SSE-NEXT: [[TMP12:%.*]] = add nuw nsw <2 x i64> [[TMP10]], [[TMP11]]
; SSE-NEXT: [[TMP13:%.]] = bitcast i64 [[ARRAYIDX2_2]] to <2 x i64>*
; SSE-NEXT: store <2 x i64> [[TMP12]], <2 x i64>* [[TMP13]], align 1
; SSE-NEXT: ret void		; SSE-NEXT: ret void
;		;
; AVX-LABEL: @pr35497(		; AVX-LABEL: @pr35497(
; AVX-NEXT: entry:		; AVX-NEXT: entry:
; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1		; AVX-NEXT: [[TMP0:%.]] = load i64, i64 undef, align 1
; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef		; AVX-NEXT: [[ADD:%.*]] = add i64 undef, undef
; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1		; AVX-NEXT: store i64 [[ADD]], i64* undef, align 1
; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4		; AVX-NEXT: [[ARRAYIDX2_2:%.]] = getelementptr inbounds [0 x i64], [0 x i64] undef, i64 0, i64 4
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4			; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
	; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_3(			; AVX-LABEL: @gather_load_3(
	; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11			; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18			; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 4
	; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6			; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP7]], i64 0
	; AVX-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i64 1
	; AVX-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i64 2
	; AVX-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3
	; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP15:%.*]] = add <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
	; AVX-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; AVX-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP10]], i64 0			; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
	; AVX-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP11]], i64 1			; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP12]], i64 2			; AVX-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP13]], i64 3			; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP14]], i64 4			; AVX-NEXT: [[TMP21:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i64 5			; AVX-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP18]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP16]], i64 6			; AVX-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7			; AVX-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>			; AVX-NEXT: [[TMP25:%.*]] = insertelement <4 x i32> poison, i32 [[TMP21]], i64 0
	; AVX-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>			; AVX-NEXT: [[TMP26:%.*]] = insertelement <4 x i32> [[TMP25]], i32 [[TMP22]], i64 1
	; AVX-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP27:%.*]] = insertelement <4 x i32> [[TMP26]], i32 [[TMP23]], i64 2
				; AVX-NEXT: [[TMP28:%.*]] = insertelement <4 x i32> [[TMP27]], i32 [[TMP24]], i64 3
				; AVX-NEXT: [[TMP29:%.*]] = add <4 x i32> [[TMP28]], <i32 1, i32 2, i32 3, i32 4>
				; AVX-NEXT: [[TMP30:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*
				; AVX-NEXT: store <4 x i32> [[TMP29]], <4 x i32>* [[TMP30]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_3(			; AVX2-LABEL: @gather_load_3(
	; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11			; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
	; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
	; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18			; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
	; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_4(			; AVX-LABEL: @gather_load_4(
	; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
	; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
				; AVX-NEXT: [[T17:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 4
	; AVX-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18			; AVX-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T3]], i64 0
				; AVX-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T7]], i64 1
				; AVX-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T11]], i64 2
				; AVX-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T15]], i64 3
				; AVX-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 1, i32 2, i32 3, i32 4>
				; AVX-NEXT: [[TMP6:%.]] = bitcast i32 [[T0]] to <4 x i32>*
				; AVX-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i64 0			; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[T19]], i64 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[T7]], i64 1			; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[T23]], i64 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[T11]], i64 2			; AVX-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[T27]], i64 2
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[T15]], i64 3			; AVX-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[T31]], i64 3
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[T19]], i64 4			; AVX-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP10]], <i32 1, i32 2, i32 3, i32 4>
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T23]], i64 5			; AVX-NEXT: [[TMP12:%.]] = bitcast i32 [[T17]] to <4 x i32>*
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T27]], i64 6			; AVX-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[T31]], i64 7
	; AVX-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
	; AVX-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_4(			; AVX2-LABEL: @gather_load_4(
	; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
	; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
	; AVX2-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18			; AVX2-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	▲ Show 20 Lines • Show All 402 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4			; SSE-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
	; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_3(			; AVX-LABEL: @gather_load_3(
	; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11			; AVX-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
	; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
	; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18			; AVX-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 4
	; AVX-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6			; AVX-NEXT: [[TMP8:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX-NEXT: [[TMP9:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP10:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP11:%.]] = load i32, i32 [[TMP3]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> poison, i32 [[TMP7]], i64 0
	; AVX-NEXT: [[TMP12:%.]] = load i32, i32 [[TMP4]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP8]], i64 1
	; AVX-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP5]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[TMP9]], i64 2
	; AVX-NEXT: [[TMP14:%.]] = load i32, i32 [[TMP6]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP14:%.*]] = insertelement <4 x i32> [[TMP13]], i32 [[TMP10]], i64 3
	; AVX-NEXT: [[TMP15:%.]] = load i32, i32 [[TMP7]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP15:%.*]] = add <4 x i32> [[TMP14]], <i32 1, i32 2, i32 3, i32 4>
	; AVX-NEXT: [[TMP16:%.]] = load i32, i32 [[TMP8]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP16:%.]] = bitcast i32 [[TMP0]] to <4 x i32>*
	; AVX-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP9]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: store <4 x i32> [[TMP15]], <4 x i32>* [[TMP16]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP18:%.*]] = insertelement <8 x i32> poison, i32 [[TMP10]], i64 0			; AVX-NEXT: [[TMP17:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
	; AVX-NEXT: [[TMP19:%.*]] = insertelement <8 x i32> [[TMP18]], i32 [[TMP11]], i64 1			; AVX-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX-NEXT: [[TMP20:%.*]] = insertelement <8 x i32> [[TMP19]], i32 [[TMP12]], i64 2			; AVX-NEXT: [[TMP19:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX-NEXT: [[TMP21:%.*]] = insertelement <8 x i32> [[TMP20]], i32 [[TMP13]], i64 3			; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX-NEXT: [[TMP22:%.*]] = insertelement <8 x i32> [[TMP21]], i32 [[TMP14]], i64 4			; AVX-NEXT: [[TMP21:%.]] = load i32, i32 [[TMP17]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP23:%.*]] = insertelement <8 x i32> [[TMP22]], i32 [[TMP15]], i64 5			; AVX-NEXT: [[TMP22:%.]] = load i32, i32 [[TMP18]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP24:%.*]] = insertelement <8 x i32> [[TMP23]], i32 [[TMP16]], i64 6			; AVX-NEXT: [[TMP23:%.]] = load i32, i32 [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP25:%.*]] = insertelement <8 x i32> [[TMP24]], i32 [[TMP17]], i64 7			; AVX-NEXT: [[TMP24:%.]] = load i32, i32 [[TMP20]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP26:%.*]] = add <8 x i32> [[TMP25]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>			; AVX-NEXT: [[TMP25:%.*]] = insertelement <4 x i32> poison, i32 [[TMP21]], i64 0
	; AVX-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP0:%.]] to <8 x i32>			; AVX-NEXT: [[TMP26:%.*]] = insertelement <4 x i32> [[TMP25]], i32 [[TMP22]], i64 1
	; AVX-NEXT: store <8 x i32> [[TMP26]], <8 x i32>* [[TMP27]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[TMP27:%.*]] = insertelement <4 x i32> [[TMP26]], i32 [[TMP23]], i64 2
				; AVX-NEXT: [[TMP28:%.*]] = insertelement <4 x i32> [[TMP27]], i32 [[TMP24]], i64 3
				; AVX-NEXT: [[TMP29:%.*]] = add <4 x i32> [[TMP28]], <i32 1, i32 2, i32 3, i32 4>
				; AVX-NEXT: [[TMP30:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*
				; AVX-NEXT: store <4 x i32> [[TMP29]], <4 x i32>* [[TMP30]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_3(			; AVX2-LABEL: @gather_load_3(
	; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11			; AVX2-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[TMP1:%.*]], i64 11
	; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4			; AVX2-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 4
	; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15			; AVX2-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 15
	; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18			; AVX2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 18
	; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX2-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]			; SSE-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @gather_load_4(			; AVX-LABEL: @gather_load_4(
	; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
	; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
				; AVX-NEXT: [[T17:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 4
	; AVX-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18			; AVX-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T7:%.]] = load i32, i32 [[T6]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T11:%.]] = load i32, i32 [[T10]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T15:%.]] = load i32, i32 [[T14]], align 4, !tbaa [[TBAA0]]
				; AVX-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> poison, i32 [[T3]], i64 0
				; AVX-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T7]], i64 1
				; AVX-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T11]], i64 2
				; AVX-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[T15]], i64 3
				; AVX-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], <i32 1, i32 2, i32 3, i32 4>
				; AVX-NEXT: [[TMP6:%.]] = bitcast i32 [[T0]] to <4 x i32>*
				; AVX-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T19:%.]] = load i32, i32 [[T18]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T23:%.]] = load i32, i32 [[T22]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T27:%.]] = load i32, i32 [[T26]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]			; AVX-NEXT: [[T31:%.]] = load i32, i32 [[T30]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <8 x i32> poison, i32 [[T3]], i64 0			; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> poison, i32 [[T19]], i64 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[T7]], i64 1			; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[T23]], i64 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[T11]], i64 2			; AVX-NEXT: [[TMP9:%.*]] = insertelement <4 x i32> [[TMP8]], i32 [[T27]], i64 2
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[T15]], i64 3			; AVX-NEXT: [[TMP10:%.*]] = insertelement <4 x i32> [[TMP9]], i32 [[T31]], i64 3
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[T19]], i64 4			; AVX-NEXT: [[TMP11:%.*]] = add <4 x i32> [[TMP10]], <i32 1, i32 2, i32 3, i32 4>
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[T23]], i64 5			; AVX-NEXT: [[TMP12:%.]] = bitcast i32 [[T17]] to <4 x i32>*
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[T27]], i64 6			; AVX-NEXT: store <4 x i32> [[TMP11]], <4 x i32>* [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <8 x i32> [[TMP7]], i32 [[T31]], i64 7
	; AVX-NEXT: [[TMP9:%.*]] = add <8 x i32> [[TMP8]], <i32 1, i32 2, i32 3, i32 4, i32 1, i32 2, i32 3, i32 4>
	; AVX-NEXT: [[TMP10:%.]] = bitcast i32 [[T0:%.]] to <8 x i32>
	; AVX-NEXT: store <8 x i32> [[TMP9]], <8 x i32>* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	; AVX2-LABEL: @gather_load_4(			; AVX2-LABEL: @gather_load_4(
	; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11			; AVX2-NEXT: [[T6:%.]] = getelementptr inbounds i32, i32 [[T1:%.*]], i64 11
	; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4			; AVX2-NEXT: [[T10:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 4
	; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15			; AVX2-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 15
	; AVX2-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18			; AVX2-NEXT: [[T18:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 18
	; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX2-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	;
ret i1 %s2		ret i1 %s2
}		}

; TODO: This is better than all-scalar and still safe,		; TODO: This is better than all-scalar and still safe,
; but we want this to be 2 reductions with glue		; but we want this to be 2 reductions with glue
; logic...or a wide reduction?		; logic...or a wide reduction?

define i1 @logical_and_icmp_clamp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp(		; SSE-LABEL: @logical_and_icmp_clamp(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; SSE-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; SSE-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x i1> [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = freeze <4 x i1> [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP3]])		; SSE-NEXT: [[TMP4:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP1]]		; SSE-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])		; SSE-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP4]], i1 [[TMP6]], i1 false		; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP4]], i1 [[TMP6]], i1 false
; CHECK-NEXT: ret i1 [[OP_RDX]]		; SSE-NEXT: ret i1 [[OP_RDX]]
		;
		; AVX-LABEL: @logical_and_icmp_clamp(
		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
		; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
		; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42
		; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42
		; AVX-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42
		; AVX-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42
		; AVX-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17
		; AVX-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
		; AVX-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
		; AVX-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
		; AVX-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
		; AVX-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
		; AVX-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
		; AVX-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false
		; AVX-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
		; AVX-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
		; AVX-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
		; AVX-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
%c3 = icmp slt i32 %x3, 42		%c3 = icmp slt i32 %x3, 42
%d0 = icmp sgt i32 %x0, 17		%d0 = icmp sgt i32 %x0, 17
%d1 = icmp sgt i32 %x1, 17		%d1 = icmp sgt i32 %x1, 17
%d2 = icmp sgt i32 %x2, 17		%d2 = icmp sgt i32 %x2, 17
%d3 = icmp sgt i32 %x3, 17		%d3 = icmp sgt i32 %x3, 17
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 %c2, i1 false		%s2 = select i1 %s1, i1 %c2, i1 false
%s3 = select i1 %s2, i1 %c3, i1 false		%s3 = select i1 %s2, i1 %c3, i1 false
%s4 = select i1 %s3, i1 %d0, i1 false		%s4 = select i1 %s3, i1 %d0, i1 false
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_extra_use_cmp(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_cmp(		; SSE-LABEL: @logical_and_icmp_clamp_extra_use_cmp(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; SSE-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2		; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2
; CHECK-NEXT: call void @use1(i1 [[TMP2]])		; SSE-NEXT: call void @use1(i1 [[TMP2]])
; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; SSE-NEXT: [[TMP3:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[TMP4:%.*]] = freeze <4 x i1> [[TMP3]]		; SSE-NEXT: [[TMP4:%.*]] = freeze <4 x i1> [[TMP3]]
; CHECK-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP4]])		; SSE-NEXT: [[TMP5:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP4]])
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP1]]		; SSE-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP1]]
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])		; SSE-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP5]], i1 [[TMP7]], i1 false		; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP5]], i1 [[TMP7]], i1 false
; CHECK-NEXT: ret i1 [[OP_RDX]]		; SSE-NEXT: ret i1 [[OP_RDX]]
		;
		; AVX-LABEL: @logical_and_icmp_clamp_extra_use_cmp(
		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
		; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
		; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42
		; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42
		; AVX-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42
		; AVX-NEXT: call void @use1(i1 [[C2]])
		; AVX-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42
		; AVX-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17
		; AVX-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
		; AVX-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
		; AVX-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
		; AVX-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
		; AVX-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
		; AVX-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
		; AVX-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false
		; AVX-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
		; AVX-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
		; AVX-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
		; AVX-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
Show All 9 Lines	;
%s4 = select i1 %s3, i1 %d0, i1 false		%s4 = select i1 %s3, i1 %d0, i1 false
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_extra_use_select(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_extra_use_select(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_extra_use_select(		; SSE-LABEL: @logical_and_icmp_clamp_extra_use_select(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; SSE-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; SSE-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i1> [[TMP1]], i32 0		; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x i1> [[TMP1]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i1> [[TMP1]], i32 1		; SSE-NEXT: [[TMP4:%.*]] = extractelement <4 x i1> [[TMP1]], i32 1
; CHECK-NEXT: [[S1:%.*]] = select i1 [[TMP3]], i1 [[TMP4]], i1 false		; SSE-NEXT: [[S1:%.*]] = select i1 [[TMP3]], i1 [[TMP4]], i1 false
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2		; SSE-NEXT: [[TMP5:%.*]] = extractelement <4 x i1> [[TMP1]], i32 2
; CHECK-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[TMP5]], i1 false		; SSE-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[TMP5]], i1 false
; CHECK-NEXT: call void @use1(i1 [[S2]])		; SSE-NEXT: call void @use1(i1 [[S2]])
; CHECK-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP2]]		; SSE-NEXT: [[TMP6:%.*]] = freeze <4 x i1> [[TMP2]]
; CHECK-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])		; SSE-NEXT: [[TMP7:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP6]])
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i1> [[TMP1]], i32 3		; SSE-NEXT: [[TMP8:%.*]] = extractelement <4 x i1> [[TMP1]], i32 3
; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP8]], i1 [[S2]], i1 false		; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP8]], i1 [[S2]], i1 false
; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[TMP7]], i1 [[OP_RDX]], i1 false		; SSE-NEXT: [[OP_RDX1:%.*]] = select i1 [[TMP7]], i1 [[OP_RDX]], i1 false
; CHECK-NEXT: ret i1 [[OP_RDX1]]		; SSE-NEXT: ret i1 [[OP_RDX1]]
		;
		; AVX-LABEL: @logical_and_icmp_clamp_extra_use_select(
		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
		; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
		; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42
		; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42
		; AVX-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42
		; AVX-NEXT: [[C3:%.*]] = icmp slt i32 [[X3]], 42
		; AVX-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17
		; AVX-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
		; AVX-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
		; AVX-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
		; AVX-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
		; AVX-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
		; AVX-NEXT: call void @use1(i1 [[S2]])
		; AVX-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
		; AVX-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false
		; AVX-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
		; AVX-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
		; AVX-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
		; AVX-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
%s4 = select i1 %s3, i1 %d0, i1 false		%s4 = select i1 %s3, i1 %d0, i1 false
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_partial(<4 x i32> %x) {
; SSE-LABEL: @logical_and_icmp_clamp_partial(		; CHECK-LABEL: @logical_and_icmp_clamp_partial(
; SSE-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2		; CHECK-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2
; SSE-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1
; SSE-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0
; SSE-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[TMP2]], i32 0		; CHECK-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP3]], 42
; SSE-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> [[TMP4]], i32 [[TMP3]], i32 1		; CHECK-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP2]], 42
; SSE-NEXT: [[TMP6:%.*]] = icmp slt <2 x i32> [[TMP5]], <i32 42, i32 42>		; CHECK-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP1]], 42
; SSE-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP1]], 42		; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; SSE-NEXT: [[TMP7:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; CHECK-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP4]]
; SSE-NEXT: [[TMP8:%.*]] = freeze <4 x i1> [[TMP7]]		; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; SSE-NEXT: [[TMP9:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP8]])		; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[C1]], i1 [[C0]], i1 false
; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP6]], i32 0		; CHECK-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i1 [[C2]], i1 false
; SSE-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP6]], i32 1		; CHECK-NEXT: [[OP_RDX2:%.*]] = select i1 [[TMP6]], i1 [[OP_RDX1]], i1 false
; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP10]], i1 [[TMP11]], i1 false		; CHECK-NEXT: ret i1 [[OP_RDX2]]
; SSE-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i1 [[C2]], i1 false
; SSE-NEXT: [[OP_RDX2:%.*]] = select i1 [[TMP9]], i1 [[OP_RDX1]], i1 false
; SSE-NEXT: ret i1 [[OP_RDX2]]
;
; AVX-LABEL: @logical_and_icmp_clamp_partial(
; AVX-NEXT: [[TMP1:%.]] = extractelement <4 x i32> [[X:%.]], i32 2
; AVX-NEXT: [[TMP2:%.*]] = extractelement <4 x i32> [[X]], i32 1
; AVX-NEXT: [[TMP3:%.*]] = extractelement <4 x i32> [[X]], i32 0
; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[TMP3]], 42
; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[TMP2]], 42
; AVX-NEXT: [[C2:%.*]] = icmp slt i32 [[TMP1]], 42
; AVX-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; AVX-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP4]]
; AVX-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; AVX-NEXT: [[OP_RDX:%.*]] = select i1 [[C1]], i1 [[C0]], i1 false
; AVX-NEXT: [[OP_RDX1:%.*]] = select i1 [[OP_RDX]], i1 [[C2]], i1 false
; AVX-NEXT: [[OP_RDX2:%.*]] = select i1 [[TMP6]], i1 [[OP_RDX1]], i1 false
; AVX-NEXT: ret i1 [[OP_RDX2]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
; remove an element from the previous test		; remove an element from the previous test
%d0 = icmp sgt i32 %x0, 17		%d0 = icmp sgt i32 %x0, 17
%d1 = icmp sgt i32 %x1, 17		%d1 = icmp sgt i32 %x1, 17
%d2 = icmp sgt i32 %x2, 17		%d2 = icmp sgt i32 %x2, 17
%d3 = icmp sgt i32 %x3, 17		%d3 = icmp sgt i32 %x3, 17
%s1 = select i1 %c0, i1 %c1, i1 false		%s1 = select i1 %c0, i1 %c1, i1 false
%s2 = select i1 %s1, i1 %c2, i1 false		%s2 = select i1 %s1, i1 %c2, i1 false
; remove an element from the previous test		; remove an element from the previous test
%s4 = select i1 %s2, i1 %d0, i1 false		%s4 = select i1 %s2, i1 %d0, i1 false
%s5 = select i1 %s4, i1 %d1, i1 false		%s5 = select i1 %s4, i1 %d1, i1 false
%s6 = select i1 %s5, i1 %d2, i1 false		%s6 = select i1 %s5, i1 %d2, i1 false
%s7 = select i1 %s6, i1 %d3, i1 false		%s7 = select i1 %s6, i1 %d3, i1 false
ret i1 %s7		ret i1 %s7
}		}

define i1 @logical_and_icmp_clamp_pred_diff(<4 x i32> %x) {		define i1 @logical_and_icmp_clamp_pred_diff(<4 x i32> %x) {
; CHECK-LABEL: @logical_and_icmp_clamp_pred_diff(		; SSE-LABEL: @logical_and_icmp_clamp_pred_diff(
; CHECK-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; SSE-NEXT: [[TMP1:%.]] = icmp slt <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP2:%.*]] = icmp ult <4 x i32> [[X]], <i32 42, i32 42, i32 42, i32 42>		; SSE-NEXT: [[TMP2:%.*]] = icmp ult <4 x i32> [[X]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>		; SSE-NEXT: [[TMP3:%.*]] = shufflevector <4 x i1> [[TMP1]], <4 x i1> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 2, i32 7>
; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>		; SSE-NEXT: [[TMP4:%.*]] = icmp sgt <4 x i32> [[X]], <i32 17, i32 17, i32 17, i32 17>
; CHECK-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP4]]		; SSE-NEXT: [[TMP5:%.*]] = freeze <4 x i1> [[TMP4]]
; CHECK-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])		; SSE-NEXT: [[TMP6:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP5]])
; CHECK-NEXT: [[TMP7:%.*]] = freeze <4 x i1> [[TMP3]]		; SSE-NEXT: [[TMP7:%.*]] = freeze <4 x i1> [[TMP3]]
; CHECK-NEXT: [[TMP8:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP7]])		; SSE-NEXT: [[TMP8:%.*]] = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> [[TMP7]])
; CHECK-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP6]], i1 [[TMP8]], i1 false		; SSE-NEXT: [[OP_RDX:%.*]] = select i1 [[TMP6]], i1 [[TMP8]], i1 false
; CHECK-NEXT: ret i1 [[OP_RDX]]		; SSE-NEXT: ret i1 [[OP_RDX]]
		;
		; AVX-LABEL: @logical_and_icmp_clamp_pred_diff(
		; AVX-NEXT: [[X0:%.]] = extractelement <4 x i32> [[X:%.]], i32 0
		; AVX-NEXT: [[X1:%.*]] = extractelement <4 x i32> [[X]], i32 1
		; AVX-NEXT: [[X2:%.*]] = extractelement <4 x i32> [[X]], i32 2
		; AVX-NEXT: [[X3:%.*]] = extractelement <4 x i32> [[X]], i32 3
		; AVX-NEXT: [[C0:%.*]] = icmp slt i32 [[X0]], 42
		; AVX-NEXT: [[C1:%.*]] = icmp slt i32 [[X1]], 42
		; AVX-NEXT: [[C2:%.*]] = icmp slt i32 [[X2]], 42
		; AVX-NEXT: [[C3:%.*]] = icmp ult i32 [[X3]], 42
		; AVX-NEXT: [[D0:%.*]] = icmp sgt i32 [[X0]], 17
		; AVX-NEXT: [[D1:%.*]] = icmp sgt i32 [[X1]], 17
		; AVX-NEXT: [[D2:%.*]] = icmp sgt i32 [[X2]], 17
		; AVX-NEXT: [[D3:%.*]] = icmp sgt i32 [[X3]], 17
		; AVX-NEXT: [[S1:%.*]] = select i1 [[C0]], i1 [[C1]], i1 false
		; AVX-NEXT: [[S2:%.*]] = select i1 [[S1]], i1 [[C2]], i1 false
		; AVX-NEXT: [[S3:%.*]] = select i1 [[S2]], i1 [[C3]], i1 false
		; AVX-NEXT: [[S4:%.*]] = select i1 [[S3]], i1 [[D0]], i1 false
		; AVX-NEXT: [[S5:%.*]] = select i1 [[S4]], i1 [[D1]], i1 false
		; AVX-NEXT: [[S6:%.*]] = select i1 [[S5]], i1 [[D2]], i1 false
		; AVX-NEXT: [[S7:%.*]] = select i1 [[S6]], i1 [[D3]], i1 false
		; AVX-NEXT: ret i1 [[S7]]
;		;
%x0 = extractelement <4 x i32> %x, i32 0		%x0 = extractelement <4 x i32> %x, i32 0
%x1 = extractelement <4 x i32> %x, i32 1		%x1 = extractelement <4 x i32> %x, i32 1
%x2 = extractelement <4 x i32> %x, i32 2		%x2 = extractelement <4 x i32> %x, i32 2
%x3 = extractelement <4 x i32> %x, i32 3		%x3 = extractelement <4 x i32> %x, i32 3
%c0 = icmp slt i32 %x0, 42		%c0 = icmp slt i32 %x0, 42
%c1 = icmp slt i32 %x1, 42		%c1 = icmp slt i32 %x1, 42
%c2 = icmp slt i32 %x2, 42		%c2 = icmp slt i32 %x2, 42
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = fcmp olt <2 x double> [[TMP7]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i1> [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP8]], i32 1
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt double [[TMP9]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[TMP10]], [[TMP9]]
	; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP]], [[CMP4]]
	; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[TMP10:%.*]] = fcmp ule <2 x double> [[TMP7]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = fcmp ule <2 x double> [[TMP7]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x i1> [[TMP11]], i32 1
	; CHECK-NEXT: [[NOT_OR_COND9:%.*]] = or i1 [[TMP11]], [[TMP12]]			; CHECK-NEXT: [[NOT_OR_COND9:%.*]] = or i1 [[TMP12]], [[TMP13]]
	; CHECK-NEXT: ret i1 [[NOT_OR_COND9]]			; CHECK-NEXT: ret i1 [[NOT_OR_COND9]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	entry:			entry:
	%fneg = fneg double %b			%fneg = fneg double %b
	%add = fsub double %c, %b			%add = fsub double %c, %b
	%mul = fmul double %a, 2.000000e+00			%mul = fmul double %a, 2.000000e+00
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1		; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4		; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4		; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[INCDEC_PTR2]] to <2 x i32>*		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>		; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4
; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[INCDEC_PTR3]] to <2 x i32>*		; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4		; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3
		; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
Show All 10 Lines	entry:
%sub8 = sub nsw i32 %3, -3		%sub8 = sub nsw i32 %3, -3
store i32 %sub8, i32* %incdec.ptr6, align 4		store i32 %sub8, i32* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1(i32* noalias %dst, i32* noalias %src) {		define void @addsub1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @addsub1(		; CHECK-LABEL: @addsub1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 2		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 2		; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <2 x i32>*		; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>		; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[DST]] to <2 x i32>*		; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i32 [[TMP1]], -1
; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
		; CHECK-NEXT: store i32 [[SUB1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[TMP6]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR3]], align 4
; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP7]], -3		; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
▲ Show 20 Lines • Show All 643 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Cost for a constant buildvector.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 436911

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions.ll

llvm/test/Transforms/PhaseOrdering/fast-basictest.ll

llvm/test/Transforms/SLPVectorizer/AArch64/memory-runtime-checks.ll

llvm/test/Transforms/SLPVectorizer/RISCV/rvv-min-vector-size.ll

llvm/test/Transforms/SLPVectorizer/X86/PR31847.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-fp.ll

llvm/test/Transforms/SLPVectorizer/X86/bool-mask.ll

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_binaryop.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_bullet.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_reordering_undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/extractcost.ll

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-minmax.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/minimum-sizes.ll

llvm/test/Transforms/SLPVectorizer/X86/pr35497.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

llvm/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

[SLP]Cost for a constant buildvector.
ClosedPublic