This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
26/35
LoopVectorize.cpp
-
VPlan.h
-
test/Transforms/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
sve-masked-loadstore.ll
-
sve-vector-reverse-mask4.ll
-
vector-reverse-mask4.ll
-
X86/
1/2
drop-poison-generating-flags.ll
-
gather_scatter.ll
-
invariant-store-vectorization.ll
-
load-deref-pred.ll
-
masked_load_store.ll
-
x86-interleaved-accesses-masked-group.ll
-
x86-interleaved-store-accesses-with-gaps.ll
-
x86-pr39099.ll
-
single-value-blend-phis.ll
-
PhaseOrdering/AArch64/
-
AArch64/
-
hoisting-sinking-required-for-vectorization.ll

Differential D111846

[LV] Drop integer poison-generating flags from instructions that need predication
ClosedPublic

Authored by dcaballe on Oct 14 2021, 3:12 PM.

Download Raw Diff

Details

Reviewers

fhahn
Ayal
gilr
dmgreen
spatel
rogfer01
nlopes

Commits

rG4348cd42c385: [LV] Drop integer poison-generating flags from instructions that need…

Summary

This patch fixes PR52111. The problem is that LV propagates poison-generating flags (nuw/nsw, exact
and inbounds) in instructions that contribute to the address computation of widen loads/stores that are
guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop
is linearized, these flags may lead to generating a poison value that is effectively used as the base address
of the widen load/store. For example, in the following loop:

loop2.header:
  %iv2 = phi i64 [ 0, %loop1.header ], [ %iv2.inc, %if.end ]
  %i23 = icmp eq i64 %iv2, 0
  br i1 %i23, label %if.end, label %if.then

if.then:
  %i27 = sub nuw nsw i64 %iv2, 1
  %i29 = getelementptr inbounds [1 x [33 x float]], [1 x [33 x float]]* %input, i64 0, i64 %iv1, i64 %i27
...

'%i27' is a uniform operation related to address computation and it's guarded by condition
'%i23' (iv2 > 0). Its nuw flag is only valid due to this condition since the condition ensures
that the value of '%iv2' reaching '%i27' will always be greater than zero and, consequently,
the result of '%iv27' will always be positive. However, after vectorizing and linearizing
control flow, '%i27' (uniform, address computation) will also be executed for '%iv2 = 0' and
the nuw flag becomes invalid.

The fix drops all the integer poison-generating flags from instructions that contribute to the
address computation of a widen load/store whose original instruction was in a basic block
that needed predication and is not predicated after vectorization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Oct 14 2021, 3:12 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptOct 14 2021, 3:12 PM

dcaballe requested review of this revision.Oct 14 2021, 3:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2021, 3:12 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dcaballe added a reviewer: rogfer01.Oct 14 2021, 3:14 PM

Harbormaster completed remote builds in B128967: Diff 379861.Oct 14 2021, 3:50 PM

Thanks @dcaballe for the patch. A bit surprising that we didn't notice this earlier but I guess masking does not get used that often.

I think it makes sense to get rid of nuw and nsw if we are explicitly scalarising something out of its (scalar) predicate.

I understand that alternatives like extending LoopVectorizationCostModel::isScalarWithPredication or VPRecipeBuilder::handleReplication would be worse. They'd entail the loop to be costed higher or have an extra branch inside the vector loop that we do not seem to need, right?

lebedev.ri added a subscriber: lebedev.ri.Oct 18 2021, 4:23 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3164	What about `exact` on division?

Thanks for the patch! Some comments inline.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3162	`Def` here should always by a `VPReplicateRecipe` I think, so you should be able to use `cast<>` instead. Or maybe even better update the function signature to pass a single `VPReplicateRecipe` reference instead both `VPValue *Def` and `VPUser &User`.
llvm/test/Transforms/LoopVectorize/pr52111.ll
3 ↗	(On Diff #379861)	Can you pre-commit the test?
8 ↗	(On Diff #379861)	It might be worth to match a bit more context here, e.g. a full triangle inside the vector body
13 ↗	(On Diff #379861)	This should not be required. If it is, the test would need to be moved to the `X86` test directory.
16 ↗	(On Diff #379861)	could we just use `float *` instead of more nested types to make the test a bit simpler?
21 ↗	(On Diff #379861)	would a single loop nest suffice or is a nested loop needed?
53 ↗	(On Diff #379861)	Is this needed?
57 ↗	(On Diff #379861)	is all that metadata needed? Might be better to use the `-force-vector-width=X` option instead of metadata, as then the vectorization factor is a bit more obvious from the run line directly.

lebedev.ri added inline comments.Oct 18 2021, 9:28 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3164	(... and other FP fast-math flags)

Thank you all for the quick response!

A bit surprising that we didn't notice this earlier but I guess masking does not get used that often.

Interestingly, we only observed this issue in AVX512. It's very likely that the cost model is heavily penalizing masking for some ISAs so this scenario wouldn't happen that often.

I understand that alternatives like extending LoopVectorizationCostModel::isScalarWithPredication or VPRecipeBuilder::handleReplication would be worse. They'd entail the loop to be costed higher or have an extra branch inside the vector loop that we do not seem to need, right?

Yeah, I don't think we should penalize or change the vectorization strategy for these cases since the nuw/nsw flags are actually not impacting the performance of the generated vector code.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3164	Good catch, thanks! I can't think of an example that applies to FP fast-math flags. The problem happens when an instruction in a predicated block is scalarized without predicate. I can only think of address computation cases that would fall into that category. Do you have any other case in mind? I'll try to write a test for the division case and I can follow up on any other cases we find separately. This is really blocking internal work. Does it sound reasonable?
llvm/test/Transforms/LoopVectorize/pr52111.ll
3 ↗	(On Diff #379861)	Do you want me to commit it before the fix and mark it as failure?
53 ↗	(On Diff #379861)	The problem is that this loop was only vectorized when targeting AVX512. The loop might be vectorized differently if I remove some of these flags. Let me try to simplify all of this and see what happens.

Addressed feedback.
When creating a test case for the division instruction I realized that the problem
could also happen for vectorized instructions. For example, the address computation
of a memory access would also be vectorized if the access is a gather/scatter. I
added support for those cases to the VPWidenRecipe.

Harbormaster completed remote builds in B129477: Diff 380589.Oct 18 2021, 11:14 PM

dcaballe added a parent revision: D112054: [LV] Pre-commit test for D111846.Oct 18 2021, 11:40 PM

Can you explain why this isn't applicable to FP fast-math flags also?

In D111846#3072117, @lebedev.ri wrote:

Can you explain why this isn't applicable to FP fast-math flags also?

The use cases that I've seen naturally happening are related to memory address computation. For those cases, I think FP instructions wouldn't make sense, unless we create some artificial IR with address computation in FP that is then converted to integers. For more generic cases, of course, we could create artificial/non-optimized IR that would expose the problem on FP but I don't see how that IR would make their way to LV with the current pipelines. If you can provide a test case, I'll be happy to address it in a separate patch. I don't think we should block this patch until we cover all the potential cases because the problem could be more complex that we even know right now. For example, linearization of the control flow might also impact __builtin_assume (not sure if we are preserving those after vectorization) and other metadata guarded by a predicate. These generic cases can be addressed incrementally by adding the corresponding logic to clearUnsafeFlagsAfterLinearizingCF as we see fit.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3162	Yeah, it makes sense. Thanks!
llvm/test/Transforms/LoopVectorize/pr52111.ll
8 ↗	(On Diff #379861)	I'm matching more instructions but note that there is no triangle (if I understand you correctly) since the control flow is linearized.
13 ↗	(On Diff #379861)	It is since there's no flag (?) to force masked vectorization. I need to target AVX512. Otherwise, the loop is vectorized with predicates, which doesn't expose the problem.
53 ↗	(On Diff #379861)	I only left the avx512 metadata
57 ↗	(On Diff #379861)	Removed all the metadata

I've been playing with this further and I can't find a way to expose the problem with FP. If anybody could provide a test case, that would be helpful, or maybe any idea that I can try. All my attempts end up being simplified by passes before LV.

In D111846#3073547, @dcaballe wrote:

I've been playing with this further and I can't find a way to expose the problem with FP. If anybody could provide a test case, that would be helpful, or maybe any idea that I can try. All my attempts end up being simplified by passes before LV.

I'm not quite sure it is reasonable to rely on other passes to prevent correctness issues in some other pass.
Does it not reproduce with a manually-written IR that only runs LV pass?

dcaballe retitled this revision from [LV] Drop NUW/NSW flags from scalarized instructions that need predication to [LV] Drop NUW/NSW flags from instructions that need predication.Oct 19 2021, 6:29 PM

dcaballe mentioned this in D112117: [llvm][LV] Drop poison-generating flags from FP instructions.Oct 19 2021, 6:53 PM

dcaballe added a child revision: D112117: [llvm][LV] Drop poison-generating flags from FP instructions.Oct 19 2021, 6:54 PM

I created D112117 to address the FP cases. It already implements the logic needed to drop some FP flags. I'm not even 100% sure about which FP flags should be dropped after predication. Definitely not all of them. Let's have this discussion in that code review. The NUW/NSW correctness issues is blocking internal work. Hopefully, this patch can move forward while we better understand what is needed for the FP cases.

Thanks,
Diego

spatel added inline comments.Oct 20 2021, 7:45 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3137	This could be Instruction::dropPoisonGeneratingFlags() instead of new code? That also handles GEP's `inbounds` flag. If `inbounds` is also a concern with this transform, can we add a test for it? Note that `dropPoisonGeneratingFlags` has a TODO comment for FMF.

dcaballe added inline comments.Oct 20 2021, 10:16 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3137	That also handles GEP's inbounds flag. If inbounds is also a concern with this transform, can we add a test for it? Thanks! That makes sense, I think. The load/store will be inbounds because we are masking out out-of-bounds elements but the GEP could be out-of-bounds. This could be Instruction::dropPoisonGeneratingFlags() instead of new code? Sure. The only problem I see is extending this utility to FMF. The utility is broadly used in LLVM (ValueTracking, LoopUtils, SimplifyIndVar, InstCombine*, BDCE, etc.). It might be hard to add FMF if that breaks all these passes, but we can try.

spatel added inline comments.Oct 20 2021, 12:52 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3137	I think it's low risk - those passes are mostly dealing with integer transforms when using that call. Even if we change something by including FMF (nnan and ninf) in the list, it should be a perf regression at worst (we shouldn't break correctness by removing FMF or wrap/exact flags).

Addressed the feedback:

Replaced 'clearUnsafeFlagsAfterLinearizingCF' with 'Instruction::dropPoisonGeneratingFlags'
Added logic to drop 'inbounds' from a VPGepWidenRecipe
Addressed all the tests impacted by the new changes (quite a few)

Please, let me know if anything else is missing.
Thanks!

dcaballe retitled this revision from [LV] Drop NUW/NSW flags from instructions that need predication to [LV] Drop integer poison-generating flags from instructions that need predication.Oct 20 2021, 6:53 PM

dcaballe edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B129861: Diff 381129.Oct 20 2021, 7:30 PM

Fixed RISCV test

Herald added subscribers: frasercrmck, luismarques, apazos and 18 others. · View Herald TranscriptOct 21 2021, 1:12 PM

Harbormaster completed remote builds in B130024: Diff 381371.Oct 21 2021, 2:25 PM

LGTM - the use of the standard dropPoisonGeneratingFlags is good, and the code comment about inbounds seems right, but I'm not familiar enough with LV (and these tests specifically) to know if there's more/less we should/could do, so please wait for at least one more reviewer to sign off.

This revision is now accepted and ready to land.Oct 22 2021, 6:57 AM

dcaballe edited child revisions, added: D112335: [LV] Pre-commit tests; removed: D112117: [llvm][LV] Drop poison-generating flags from FP instructions.Oct 22 2021, 11:15 AM

Thanks, Sanjay! Anything else, Florian/Roger/Roman?

I think this is right, but just to be extra sure, let's ask @nlopes ?

I don't know the code of the loop vectorizer, but I have some 2 concerns.

Correctness:

The underlying issue of the bug report is that we widen a load to a masked vector load which takes only the address of the first iteration. We need to ensure this address is not poison if it wouldn't be poison for the non-masked out loads.
Dropping poison-producing flags from any operation that is hoisted of conditional BBs that contribute to the address is a good step forward. But it doesn't seem enough.
Consider this example:

loop:
  %i = phi [poison, %entry], [%i3, %loop]
  %i2 = phi [0, %entry], [%i3, loop]
  if (%i2 > 0) {
     %p = gep %p, %i
     load %p
  }
   %i3 = add %i2, 1
   br %cond, loop, ..

Now vectorize that load to a masked load of %p and we are doomed because %p is poison in the first iteration.
Not sure this example would kick in with the vectorizer as it needs to prove that loads are contiguous, but maybe SCEV will take the poison as 0 and vectorization kicks in. Anyway, just to say that just dropping poison-producing attributes may not be enough.

The second point is that it seems the patch drops attributes from all hoisted instructions, but that's not strictly needed. You only need to drop attributes from instructions that contribute to instructions that are widened and that produce UB if given poison. I don't think LLVM can produce e.g. a division of a vector by a scalar (if the loop always divides by the same value).

Thanks for the feedback, Nuno!

Dropping poison-producing flags from any operation that is hoisted of conditional BBs that contribute to the address is a good step forward. But it doesn't seem enough.

That's a great point. One option could be avoiding masking in basic blocks with direct or indirect uses of poison and scalarize that code instead. We could even only scalarize the impacted instructions and apply masking to the rest. I think addressing those cases would require much more involved changes. I agree that the current fix is a good step forward. We can address more complex cases incrementally.

The second point is that it seems the patch drops attributes from all hoisted instructions, but that's not strictly needed. You only need to drop attributes from instructions that contribute to instructions that are widened and that produce UB if given poison. I don't think LLVM can produce e.g. a division of a vector by a scalar (if the loop always divides by the same value).

LV can produce a division of a vector by a scalar by broadcasting the scalar into a vector. Is that what you mean?

I think the key points here are the instructions with the attributes and their guarding predicate. These instructions, *regardless of their uses*, may produce a poison value and UB themselves if the guarding predicate is dropped as a result of vectorization because they will be executed for iterations of the loop that wouldn't be executed in the original scalar loop. For instance, in the motivating example, a sub instruction will produce a poison value itself if its guarding predicate is dropped, regardless of whether the following gep/load using the result of the sub is widened or kept scalar. There are passes that will optimize away poison-generating instructions (i.e., this sub), leading to UB. For example, InstructionSimplify performs some of these optimizations (I think it turns this sub into a 0). Note that we are not blindly dropping all the poison-generating flags from all the instructions. We make sure that we only drop these flags from instructions were originally guarded by a predicate that has been dropped as a result of vectorization. If those instructions are scalarized (i.e., the predicate is preserved), only interleaved, etc., we are not dropping their flags.

Hopefully, that helps! Please, let me know if you have any other comments.

Thanks,
Diego

In D111846#3092783, @dcaballe wrote:

The second point is that it seems the patch drops attributes from all hoisted instructions, but that's not strictly needed. You only need to drop attributes from instructions that contribute to instructions that are widened and that produce UB if given poison. I don't think LLVM can produce e.g. a division of a vector by a scalar (if the loop always divides by the same value).

LV can produce a division of a vector by a scalar by broadcasting the scalar into a vector. Is that what you mean?

My point was that load/store are special in how they are widened. A vector load of %p is equivalent of load %p, load %p+1, load %p+2, etc.
So internally load produces the operands that would have been executed in subsequent iterations.
The root of the problem being addressed here is that we only pass the first operand and then have the subsequent computed internally by the operation. If we hit the case when the 1st iteration is the one masked out we may be in trouble.
It's a pretty cool bug! :)

Two conclusions from this:

If the 1st iteration (of the vectorized bundle) isn't masked out, we are good: no need to drop any attributes. This is because we would hit any UB in the original program anyway.
We only need to drop attributes from instructions that flow into the operands of instructions that have the property of internally computing the operands for subsequent sub-operations, like in load/store that increment pointers. I don't think LLVM has any other operation other than load/store with this property?

This means the fix should be limited to operands of load/store and for when only the 1st iteration is masked out.
Right now it seems that the code will drop attributes from any instruction regardless of whether it flows into a load/store or not. It's way too conservative. I would rather see it fixed properly now than get a promise of a fix in the future (that statistically rarely happen in the LLVM community).

To make things fully correct, there's also the concern that we need to ensure the value of the 1st iteration isn't itself already poison. This is a tricky dance with SCEV and I don't know if the example I posted previously would kick in right now, but could in the future, so at least we could add that as a unit test to make sure we don't regress if SCEV becomes smarter.

I think the key points here are the instructions with the attributes and their guarding predicate. These instructions, *regardless of their uses*, may produce a poison value and UB themselves if the guarding predicate is dropped as a result of vectorization because they will be executed for iterations of the loop that wouldn't be executed in the original scalar loop.

It's totally fine to execute instructions that yield poison. They won't lead to UB unless used.
Instructions that may produce UB themselves can't be hoisted unless predicated, but I hope that's already accounted for.

If the 1st iteration (of the vectorized bundle) isn't masked out, we are good: no need to drop any attributes. This is because we would hit any UB in the original program anyway.
We only need to drop attributes from instructions that flow into the operands of instructions that have the property of internally computing the operands for subsequent sub-operations, like in load/store that increment pointers. I don't think LLVM has any other operation other than load/store with this property?

I focused on fixing only address computation cases initially but with the ongoing discussion I understood I was oversimplifying the problem and that we should be more conservative and generalize it to more valid cases. What is not clear to me is what a valid case is. Hopefully, you can help me understand. Should dropping poison-generating flags be UB-driven only or should they be semantically correct based on the definition of each poison-generating flag? A few related questions for my understanding:

Should a regular vector add (unmasked) keep the nsw/nuw flags if one vector lane X might overflow?
What if the result of the vector add is used by a masked vector store which masks out lane X?
Should a GEP instruction keep the inbounds flag if after vectorization the computed address is actually out-of-bounds but out-of-bounds elements are masked out by the consumer masked load/store/gather/scatter?
What about vector instructions feeding a vector GEP feeding a gather/scatter?
What about FMF (https://reviews.llvm.org/D112117)?

I think I need an answer to these questions to really understand what is needed. Hopefully, you can help me with this. It sounds really interesting! :).

The definition of the poison-generating flags seems ambiguous for vector types. If the properties of a flag might not hold within the context of the instruction itself and for all the vector lanes, wouldn’t it be semantically incorrect to keep that flag? Keeping the flag because the user of that instruction will mask out the invalid lanes sounds concerning to me but maybe I’m wrong here.

This means the fix should be limited to operands of load/store and for when only the 1st iteration is masked out.

If we wanted to go this way, we would have to prove that the first vector lane of at least one vector iteration is masked out, not necessarily the first lane of the first vector iteration. It seems complicated to prove that accurately. Any ideas? The guarding condition could be complex and missing just one flag would defeat the purpose of the fix. Maybe we could consider dropping all the flags in instructions involved in address computation. Would that be reasonable?

This is a tricky dance with SCEV and I don't know if the example I posted previously would kick in right now, but could in the future, so at least we could add that as a unit test to make sure we don't regress if SCEV becomes smarter.

LV bails out because the first phi node is not supported by the vectorizer. I can definitely add a test for this case.

It would be great to know what the other reviewers think about this!

Thanks,
Diego

Should a regular vector add (unmasked) keep the nsw/nuw flags if one vector lane X might overflow?

Yes. The result won't be used if it wasn't used by the original program (except for widened loads/stores, as discussed before)

What if the result of the vector add is used by a masked vector store which masks out lane X?

No problem. Masked store is equivalent to:
if (mask[i])

store v[i] p[i]

So v[i] can be poison because it won't be used.

Should a GEP instruction keep the inbounds flag if after vectorization the computed address is actually out-of-bounds but out-of-bounds elements are masked out by the consumer masked load/store/gather/scatter?

Yep, same as the previous. But as long as the address is not used for a widened load/store.

What about vector instructions feeding a vector GEP feeding a gather/scatter?

That depends. Was the original program using that same value? If so, no problem. Otherwise I need a concrete example as I can't imagine one right away :)

What about FMF (https://reviews.llvm.org/D112117)?

The problem with floats with signaling NaNs I guess. It wouldn't be correct to execute those speculatively. But that's a totally different problem.
If it's just about FMF producing poison, then no worries and that patch doesn't seem necessary (because if something is broken, it's a much bigger problem).

The definition of the poison-generating flags seems ambiguous for vector types. If the properties of a flag might not hold within the context of the instruction itself and for all the vector lanes, wouldn’t it be semantically incorrect to keep that flag? Keeping the flag because the user of that instruction will mask out the invalid lanes sounds concerning to me but maybe I’m wrong here.

It's totally fine to execute instructions that produce poison as long as their value is not used. If you use them in a masked_store, the masked out values are not used.
This is correct:

if (i != 1)
  r[i] = a[i] +nsw b[i]
=>
tmp = a[i..i+3] +nsw [i..i+3]
store a, r, <1,0,1,1>

Sorry, I don't know the exact syntax, but the point is that you can execute the add for i == 1 with nsw as you'll never use the result.

This means the fix should be limited to operands of load/store and for when only the 1st iteration is masked out.

If we wanted to go this way, we would have to prove that the first vector lane of at least one vector iteration is masked out, not necessarily the first lane of the first vector iteration. It seems complicated to prove that accurately. Any ideas? The guarding condition could be complex and missing just one flag would defeat the purpose of the fix. Maybe we could consider dropping all the flags in instructions involved in address computation. Would that be reasonable?

If by that you mean dropping flags from instructions whose values flow into addresses of widened load/store operations, yes! That's a good way of fixing the 1st issue.
Unless you know something about the mask. If the 1st lane of every iteration is not masked out, you don't need to do anything as that value would be dereferenced in the original program.

This is a tricky dance with SCEV and I don't know if the example I posted previously would kick in right now, but could in the future, so at least we could add that as a unit test to make sure we don't regress if SCEV becomes smarter.

LV bails out because the first phi node is not supported by the vectorizer. I can definitely add a test for this case.

That's great! I was worried your bug report exposes 2 separate problems. I just didn't know if the 2nd one happened in practice or not.

I'm happy to answer further questions, of course. If you have concrete examples in mind even better as it's easier to communicate I think.

Thanks a lot for the explanation and the quick response...

Should a GEP instruction keep the inbounds flag if after vectorization the computed address is actually out-of-bounds but out-of-bounds elements are masked out by the consumer masked load/store/gather/scatter?

Yep, same as the previous. But as long as the address is not used for a widened load/store.

Let me clarify... The GEP will feed a masked load/store. We won't load the data in that out-of-bounds address (masked out) but that address will be used as base for the masked load/store. It sounds like a case similar to the one exposing the bug. Since the address will be used, I understand we should drop the inbounds.

What about vector instructions feeding a vector GEP feeding a gather/scatter?

That depends. Was the original program using that same value? If so, no problem. Otherwise I need a concrete example as I can't imagine one right away :)

Gathers/scatters are modeled as a vector of pointers using a vector GEP so if an address is poison it will be masked out, at least, initially. Unfortunately, some backends will turn this vector of pointers into a single base pointer + offsets. If the base pointer is poison, we have the same issue as with masked loads/stores. This is basically a vector variant of that problem. I would suggest dropping the flags also here, following the same logic. Does it make sense?

The problem with floats with signaling NaNs I guess. It wouldn't be correct to execute those speculatively. But that's a totally different problem.
If it's just about FMF producing poison, then no worries and that patch doesn't seem necessary (because if something is broken, it's a much bigger problem).

Ok.

Maybe we could consider dropping all the flags in instructions involved in address computation. Would that be reasonable?

If by that you mean dropping flags from instructions whose values flow into addresses of widened load/store operations, yes! That's a good way of fixing the 1st issue.
Unless you know something about the mask. If the 1st lane of every iteration is not masked out, you don't need to do anything as that value would be dereferenced in the original program.

Ok, I can add a check that looks for a widen load/store in the def-use chain.

Thanks!

In D111846#3096604, @dcaballe wrote:

Thanks a lot for the explanation and the quick response...

Should a GEP instruction keep the inbounds flag if after vectorization the computed address is actually out-of-bounds but out-of-bounds elements are masked out by the consumer masked load/store/gather/scatter?

Yep, same as the previous. But as long as the address is not used for a widened load/store.

Let me clarify... The GEP will feed a masked load/store. We won't load the data in that out-of-bounds address (masked out) but that address will be used as base for the masked load/store. It sounds like a case similar to the one exposing the bug. Since the address will be used, I understand we should drop the inbounds.

A masked load of a masked out lane is a NOP essentially, so the address can be poison as it's not actually used. You don't need to drop inbounds.
LangRef doesn't require the address of masked out lanes to be aligned, as in e.g. memcpy that even with size=0 the address must be properly aligned.

What about vector instructions feeding a vector GEP feeding a gather/scatter?

That depends. Was the original program using that same value? If so, no problem. Otherwise I need a concrete example as I can't imagine one right away :)

Gathers/scatters are modeled as a vector of pointers using a vector GEP so if an address is poison it will be masked out, at least, initially. Unfortunately, some backends will turn this vector of pointers into a single base pointer + offsets. If the base pointer is poison, we have the same issue as with masked loads/stores. This is basically a vector variant of that problem. I would suggest dropping the flags also here, following the same logic. Does it make sense?

Sounds like the same problem, yes. But just dropping flags isn't enough, because similarly the base pointer could have been poison already.
Those backends are buggy. Fixing requires either proving the base pointer isn't poison in the fist place, or deriving a base address from a non-masked out lane (probably the easiest solution?).

Let me clarify... The GEP will feed a masked load/store. We won't load the data in that out-of-bounds address (masked out) but that address will be used as base for the masked load/store. It sounds like a case similar to the one exposing the bug. Since the address will be used, I understand we should drop the inbounds.

A masked load of a masked out lane is a NOP essentially, so the address can be poison as it's not actually used. You don't need to drop inbounds.
LangRef doesn't require the address of masked out lanes to be aligned, as in e.g. memcpy that even with size=0 the address must be properly aligned.

Ok, let me update the patch so that we can discuss on actual examples. I have the impression that we both agree but might be looking at it from a different perspective.

Gathers/scatters are modeled as a vector of pointers using a vector GEP so if an address is poison it will be masked out, at least, initially. Unfortunately, some backends will turn this vector of pointers into a single base pointer + offsets. If the base pointer is poison, we have the same issue as with masked loads/stores. This is basically a vector variant of that problem. I would suggest dropping the flags also here, following the same logic. Does it make sense?

Sounds like the same problem, yes. But just dropping flags isn't enough, because similarly the base pointer could have been poison already.
Those backends are buggy. Fixing requires either proving the base pointer isn't poison in the fist place, or deriving a base address from a non-masked out lane (probably the easiest solution?).

Deriving a base address from a non-masked out lane makes sense but we don't always have that information at compile time. In any case, this patch is a good starting point to fix the backend problems. We can iterate on it later as we see fit.

Added check to make sure we only drop poison-generating flags from instructions contributing to the address computation of masked loads/stores.
Removed logic to drop flags from widen GEPs (for gathers/scatters)
Removed logic to drop flags from all the widen instructions.
Reverted changes in impacted tests.

Harbormaster completed remote builds in B131658: Diff 383687.Oct 31 2021, 4:17 PM

Looks good to me. Added a couple of suggestions to the code.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3144	isn't it inefficient to recompute this information for every operation? You get a O(n^2) behavior (up to n calls of this function, each traversing up to n instructions)
3151	An address computation may depend on float operations. e.g. you may have a float -> int, a select based on a float comparison, etc.
3166	you don't need to consider users of loads (nor of stores). you can break here if you want if WidenMemRec != nullptr.

Addressed feedback.
Changed the approach to improve the complexity of detecting the
poison-generating recipes. We now gather these recipes before
executing the VPlan, starting from the recipes generating a widen
load/store and traversing the backward slice from their address
operand. In this way, recipes are only visited once.

Please, let me know if there is any other comment.

Thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3144	I changed the approach so that we gather the target recipes before executing the VPlan.
3151	True, removed that check.
3166	It makes sense. Done. Updating doc accordingly.

Harbormaster completed remote builds in B132451: Diff 384745.Nov 4 2021, 7:27 AM

looks correct to me. Left just one perf suggestion.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1195	this visited set can be hoisted outside the lambda. You should only traverse each instruction at most once, as you need to drop poison flags if the instructions contributes to any address (doesn't matter which)

Thanks for the update! I think the description/title would need a minor update, as they still talk about 'integer' poison generating flags.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
889	This is only used during codegen, right? In that case, it would probably be better to just add it to VPTransformState instead. Otherwise it just adds a little bit of additional coupling between ILV and codegen, making it harder to separate in the future.
1219	Is there a way to avoid querying information from legal here? As we are interested in any operation contributing to an address computation of a masked memory operation, could we only seed the worklist with addresses of masked operations instead (both VPWidenMemoryInstructionRecipe & VPInterleaveRecipe have `hasMask`/`getMask` helpers)? Then we would not need to check legal here, right? Checking `Instrs` location here might not be sufficient in any case I think. There could be cases where `Instr` is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from `Instr`. In any case, it might be good to add such a test case.
1225	Do we in some cases also need to drop flags from recipes other then VPReplicateRecipe? If we need to drop flags from recipes in unpredicted blocks (see comment above). then we could have cases where a vector operation (e.g. VPWidenRecipe) feeds the address computation (the first lane will get extracted). An example could be define void @drop_vector(float* noalias nocapture readonly %input, float* %output, i64* noalias %ints) local_unnamed_addr #0 { entry: br label %loop.header loop.header: %iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ] %i23 = icmp eq i64 %iv, 0 %gep = getelementptr inbounds i64, i64* %ints, i64 %iv %lv = load i64, i64* %gep %i27 = sub nuw nsw i64 %iv, 1 %r = add i64 %lv, %i27 store i64 %r, i64* %gep br i1 %i23, label %if.end, label %if.then if.then: %i29 = getelementptr inbounds float, float* %input, i64 %i27 %i30 = load float, float* %i29, align 4, !invariant.load !0 br label %if.end if.end: %i34 = phi float [ 0.000000e+00, %loop.header ], [ %i30, %if.then ] %i35 = getelementptr inbounds float, float* %output, i64 %iv store float %i34, float* %i35, align 4 %iv.inc = add nuw nsw i64 %iv, 1 %exitcond = icmp eq i64 %iv.inc, 4 br i1 %exitcond, label %loop.exit, label %loop.header loop.exit: ret void } attributes #0 = { noinline nounwind uwtable "target-features"="+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl" } !0 = !{}

Thanks for the feedback, Florian! I added some comments. Working on some of the changes while waiting for some clarifications. Thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
889	Sure, will do!
1195	It makes sense. Will do.
1219	As we are interested in any operation contributing to an address computation of a masked memory operation, could we only seed the worklist with addresses of masked operations instead (both VPWidenMemoryInstructionRecipe & VPInterleaveRecipe have hasMask/getMask helpers)? Then we would not need to check legal here, right? I had tried using `getMask` already. Unfortunately, there are cases where the loads are predicated but the mask has been dropped in favor of executing them speculatively unmasked. See https://reviews.llvm.org/D66688 and test `load-deref-pred.ll`. We should drop the flags from these cases as well and the only way I found is using `Legal->blockNeedsPredication`. I know it's not ideal but `Legal` is still widely used during codegen. Checking Instrs location here might not be sufficient in any case I think. There could be cases where Instr is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from Instr. In any case, it might be good to add such a test case. I think that makes sense. It would a case similar to the one that exposed the bug but moving the `subi` instruction outside the condition. I'll give it a try!
1225	I assumed that address computation would be uniform for those cases and left scalar, wouldn't it?. I'll give it a try but there has been some back and forth wrt dropping the flags from vector instructions already. Based on previous feedback, I'm not sure if we should drop the flags from them when there are lanes that won't generate poison. I would need some clarification before proceeding with those changes (@nlopes).

dcaballe added inline comments.Nov 7 2021, 3:00 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1225	Looking closely at your example, the poison value generated by 'sub' in the first iteration of the loop is actually used in the scalar version of the code (it is stored). For that reason, my impression is that having the `nuw` flag in the input `sub` would be incorrect, isn't that the case?

Added support for non-predicated poison-generating instruction cases.
Move MayGeneratePoisonRecipes to VPTransformState.
Update documentation.

Herald added a subscriber: vkmr. · View Herald TranscriptNov 7 2021, 6:32 AM

Moved visited set outside of lambda function

dcaballe added inline comments.Nov 7 2021, 6:46 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1219	Using `getMask` can't be used for the interleave recipe either. There are interleave groups that are formed out of non-predicated loads/stores. However, the resulting interleave group may still require a mask if the to-be-generated widen load reads more elements than those needed for the group. We should not drop poison flags in those cases.

Harbormaster completed remote builds in B132903: Diff 385341.Nov 7 2021, 7:44 AM

dcaballe edited the summary of this revision. (Show Details)Nov 7 2021, 7:58 AM

nlopes added inline comments.Nov 7 2021, 12:00 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1225	I mentioned this issue before: what happens if a value was already poison in the first place? I suggest you commit this patch first, as it addresses half of the problem, and then we can discuss what's the best way to fix the second part. Just dropping flags from even all instructions within the function isn't sufficient as you may get a poison value as input (change the example above to have `[%arg, entry ]` rather than `[ 0, entry ]` for %iv. Though you only need to patch the code for the cases where you cannot prove the code would execute in the scalar version. The right fix isn't trivial.

Fixed failing test

Ok, I'll wait for Florian's approval to land this patch.

Thanks!
Diego

Harbormaster completed remote builds in B132975: Diff 385436.Nov 8 2021, 4:20 AM

fhahn added inline comments.Nov 8 2021, 1:49 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1219	I had tried using getMask already. Unfortunately, there are cases where the loads are predicated but the mask has been dropped in favor of executing them speculatively unmasked. See https://reviews.llvm.org/D66688 and test load-deref-pred.ll. We should drop the flags from these cases as well and the only way I found is using Legal->blockNeedsPredication. I know it's not ideal but Legal is still widely used during codegen. Hmm, that's unfortunate! There are still plenty of uses of `Legal`, but each new use we add makes things more difficult to transition. Let me take a look if there's another alternative, but if there's none it's not the end of the world for now.
1225	I tried reading the discussion again, but I'm not sure why it would matter whether the `sub` gets widened to a vector or not for the test case. In the test case, the issue should still be the same as in the motivating test case, the flags on the instruction can poison (some) vector lanes, independent of whether the inputs were poison originally. In the example, the `sub nuw nsw` gets widened, but similar to the other test cases, the first lane is used in the address computation of the masked memory operation. Effectively I am not sure I see the difference whether we compute only the first lane (because we scalarized the `sub`) or if we compute all lanes and use the first lane only in a UB-on-poison op and have other non-UB-on-poison uses of the full vector (storing a vector with (some) poison lanes should not be UB). If we only restrict this to VPReplicateRecipe, it seems like we can easily run into slight variations of the motivating test case where we still have the same issue as the patch is trying to fix. Now if we support non-block dropping flags from non-BB-local instruction/recipes, there are cases where we may not need to drop flags, e.g. because we can prove that poison generated by the instruction cause UB unconditionally. In those cases, I think for now our priority should be correctness, as it’s unlikely for poison-generating flags to make a notable difference for the vector loops during codegen.
1249	I think for consistency with the LLVM style used here variables should start with an upper case letter (https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly)
llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
82–83	Should `nuw nsw` be retained in the source here and the same for `inbounds`. It looks like they got dropped.

dcaballe added inline comments.Nov 9 2021, 2:33 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1219	Sure! Note that `blockNeedsPredication` is used in quite a few places so all of them should be addressed in the same way.
1225	Semantics on poison-generating flags on vector instructions are not clear to me, even after the discussion. I added support for those cases initially, then removed it... It should be easy to add them again but I want to make sure we agree on that before introducing any changes. If those cases need further discussion I'll be happy to address them in a separate commit. I'm not too concerned about those cases since I don't think LLVM can replace the value of a specific lane with a poison value at this point. Now if we support non-block dropping flags from non-BB-local instruction/recipes, there are cases where we may not need to drop flags, e.g. because we can prove that poison generated by the instruction cause UB unconditionally. In those cases, I think for now our priority should be correctness, as it’s unlikely for poison-generating flags to make a notable difference for the vector loops during codegen. Not sure I fully understand this part but the key point, as discussed previously, is the usage of the poison value. We may have an instruction that may generates a poison value unconditionally. It should be ok as long as the potential poison value is not used. We should drop the flags if the potential poison value happens to be used after vectorization. That's exactly what the latest implementation is doing.
llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll
82–83	This is the test you asked me to add, isn't it? Checking Instrs location here might not be sufficient in any case I think. There could be cases where Instr is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from Instr. In any case, it might be good to add such a test case.` `sub` and `getelementptr` are not predicated but feed a load that is predicated.

Fixed naming convention issues

Harbormaster completed remote builds in B133273: Diff 385838.Nov 9 2021, 9:11 AM

Since it was already implemented in a previous version of the code, I restored the logic to drop flags
from Widen and WidenGEP recipes following Florian's suggestion. These changes only impact Florian's
test case. I modified the test case to also have a vector GEP feeding a masked load to also cover
that use case.

Let me know if you have any further comments.
Thanks

Harbormaster completed remote builds in B133310: Diff 385902.Nov 9 2021, 11:35 AM

Any other comments? :)

Thanks for the update! I left a few small remaining comments. Basically LGTM, but it would still be good to hear @nlopes thoughts on the recent responses with respect to widened instructions.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1213	I think it would be helpful to add a comment spelling out what we are looking for here.
1216	I'm not entirely sure why we need to check the parent region/replicator? It would be great if you could include the reasoning in the comment above, or remove if it is not needed.
1231	nit: it might be more explicit to pass the plan as argument instead of accessing it from State.

Updated and added more comments.
Simplified redundant condition.

Thanks, Florian! I addressed the feedback.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1216	Good catch! Actually most of these conditions are redundant or irrelevant in the new algorithm since we already know this recipe is contributing to the address computation of a VPWidenMemoryRecipe or VPInterleaveRecipe. That simplifies a lot the condition. Thanks!
1231	We are also using other fields from `State` here, like `VF` and `MayGeneratePoisonRecipes` and this is the only used of `Plan`. I'm not sure if it's worth it.

Harbormaster completed remote builds in B134359: Diff 387402.Nov 15 2021, 3:26 PM

I talked to Nuno in private and he mentioned that I could go ahead and commit the changes and address any minor feedback in a separate commit since he is very busy right now.
I'll commit this on Monday if no other comments.

Thank you all for the feedback!
Diego

In D111846#3143482, @dcaballe wrote:

I talked to Nuno in private and he mentioned that I could go ahead and commit the changes and address any minor feedback in a separate commit since he is very busy right now.
I'll commit this on Monday if no other comments.

Thank you all for the feedback!
Diego

Sounds good to me! LGTM, thanks.

dcaballe mentioned this in rGa7027bb79971: [LV] Pre-commit test for D111846.Nov 22 2021, 2:15 AM

This patch already stretches a bit my knowledge of the vectorizer code, but the examples look good.
Thanks @fhahn for the test case. And apologies for the delay in getting back to this; I had a deadline Friday and was quite busy.

We should fuzz this thing. I'm not super comfortable still. I've a few theoretical concerns, but I don't know if they trigger in practice.

Closed by commit rG4348cd42c385: [LV] Drop integer poison-generating flags from instructions that need… (authored by dcaballe). · Explain WhyNov 22 2021, 2:59 AM

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rG4348cd42c385: [LV] Drop integer poison-generating flags from instructions that need….

Thanks for all the feedback! I think this will cover a good part of the cases that could actually go wrong in practice. We can accommodate more cases as we see fit.

dcaballe mentioned this in D112335: [LV] Pre-commit tests.Nov 22 2021, 3:06 AM

anna mentioned this in D145616: [LV] Use speculatability within entire loop to avoid strided load predication.Mar 21 2023, 7:32 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

176 lines

VPlan.h

12 lines

test/

Transforms/

LoopVectorize/

AArch64/

sve-masked-loadstore.ll

4 lines

sve-vector-reverse-mask4.ll

4 lines

vector-reverse-mask4.ll

12 lines

X86/

drop-poison-generating-flags.ll

26 lines

gather_scatter.ll

44 lines

invariant-store-vectorization.ll

4 lines

load-deref-pred.ll

176 lines

masked_load_store.ll

592 lines

x86-interleaved-accesses-masked-group.ll

48 lines

x86-interleaved-store-accesses-with-gaps.ll

4 lines

x86-pr39099.ll

4 lines

single-value-blend-phis.ll

4 lines

PhaseOrdering/

AArch64/

hoisting-sinking-required-for-vectorization.ll

2 lines

Diff 388842

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	public:
/// is generated around the vectorized (and scalar epilogue) loops consisting		/// is generated around the vectorized (and scalar epilogue) loops consisting
/// of various checks and bypasses. Return the pre-header block of the new		/// of various checks and bypasses. Return the pre-header block of the new
/// loop.		/// loop.
/// In the case of epilogue vectorization, this function is overriden to		/// In the case of epilogue vectorization, this function is overriden to
/// handle the more complex control flow around the loops.		/// handle the more complex control flow around the loops.
virtual BasicBlock *createVectorizedLoopSkeleton();		virtual BasicBlock *createVectorizedLoopSkeleton();

/// Widen a single instruction within the innermost loop.		/// Widen a single instruction within the innermost loop.
void widenInstruction(Instruction &I, VPValue *Def, VPUser &Operands,		void widenInstruction(Instruction &I, VPWidenRecipe *WidenRec,
VPTransformState &State);		VPTransformState &State);

/// Widen a single call instruction within the innermost loop.		/// Widen a single call instruction within the innermost loop.
void widenCallInstruction(CallInst &I, VPValue *Def, VPUser &ArgOperands,		void widenCallInstruction(CallInst &I, VPValue *Def, VPUser &ArgOperands,
VPTransformState &State);		VPTransformState &State);

/// Widen a single select instruction within the innermost loop.		/// Widen a single select instruction within the innermost loop.
void widenSelectInstruction(SelectInst &I, VPValue *VPDef, VPUser &Operands,		void widenSelectInstruction(SelectInst &I, VPValue *VPDef, VPUser &Operands,
bool InvariantCond, VPTransformState &State);		bool InvariantCond, VPTransformState &State);

/// Fix the vectorized code, taking care of header phi's, live-outs, and more.		/// Fix the vectorized code, taking care of header phi's, live-outs, and more.
void fixVectorizedLoop(VPTransformState &State);		void fixVectorizedLoop(VPTransformState &State);

// Return true if any runtime check is added.		// Return true if any runtime check is added.
bool areSafetyChecksAdded() { return AddedSafetyChecks; }		bool areSafetyChecksAdded() { return AddedSafetyChecks; }

/// A type for vectorized values in the new loop. Each value from the		/// A type for vectorized values in the new loop. Each value from the
/// original loop, when vectorized, is represented by UF vector values in the		/// original loop, when vectorized, is represented by UF vector values in the
/// new unrolled loop, where UF is the unroll factor.		/// new unrolled loop, where UF is the unroll factor.
using VectorParts = SmallVector<Value *, 2>;		using VectorParts = SmallVector<Value *, 2>;

/// Vectorize a single GetElementPtrInst based on information gathered and		/// Vectorize a single GetElementPtrInst based on information gathered and
/// decisions taken during planning.		/// decisions taken during planning.
void widenGEP(GetElementPtrInst GEP, VPValue VPDef, VPUser &Indices,		void widenGEP(GetElementPtrInst GEP, VPWidenGEPRecipe WidenGEPRec,
unsigned UF, ElementCount VF, bool IsPtrLoopInvariant,		VPUser &Indices, unsigned UF, ElementCount VF,
SmallBitVector &IsIndexLoopInvariant, VPTransformState &State);		bool IsPtrLoopInvariant, SmallBitVector &IsIndexLoopInvariant,
		VPTransformState &State);

/// Vectorize a single first-order recurrence or pointer induction PHINode in		/// Vectorize a single first-order recurrence or pointer induction PHINode in
/// a block. This method handles the induction variable canonicalization. It		/// a block. This method handles the induction variable canonicalization. It
/// supports both VF = 1 for unrolled loops and arbitrary length vectors.		/// supports both VF = 1 for unrolled loops and arbitrary length vectors.
void widenPHIInstruction(Instruction PN, VPWidenPHIRecipe PhiR,		void widenPHIInstruction(Instruction PN, VPWidenPHIRecipe PhiR,
VPTransformState &State);		VPTransformState &State);

/// A helper function to scalarize a single Instruction in the innermost loop.		/// A helper function to scalarize a single Instruction in the innermost loop.
/// Generates a sequence of scalar instances for each lane between \p MinLane		/// Generates a sequence of scalar instances for each lane between \p MinLane
/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,		/// and \p MaxLane, times each part between \p MinPart and \p MaxPart,
/// inclusive. Uses the VPValue operands from \p Operands instead of \p		/// inclusive. Uses the VPValue operands from \p RepRecipe instead of \p
/// Instr's operands.		/// Instr's operands.
void scalarizeInstruction(Instruction Instr, VPValue Def, VPUser &Operands,		void scalarizeInstruction(Instruction Instr, VPReplicateRecipe RepRecipe,
const VPIteration &Instance, bool IfPredicateInstr,		const VPIteration &Instance, bool IfPredicateInstr,
VPTransformState &State);		VPTransformState &State);

/// Widen an integer or floating-point induction variable \p IV. If \p Trunc		/// Widen an integer or floating-point induction variable \p IV. If \p Trunc
/// is provided, the integer induction variable will first be truncated to		/// is provided, the integer induction variable will first be truncated to
/// the corresponding type.		/// the corresponding type.
void widenIntOrFpInduction(PHINode IV, Value Start, TruncInst *Trunc,		void widenIntOrFpInduction(PHINode IV, Value Start, TruncInst *Trunc,
VPValue Def, VPValue CastDef,		VPValue Def, VPValue CastDef,
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	protected:
/// addNewMetadata). Use this for newly created instructions in the vector		/// addNewMetadata). Use this for newly created instructions in the vector
/// loop.		/// loop.
void addMetadata(Instruction To, Instruction From);		void addMetadata(Instruction To, Instruction From);

/// Similar to the previous function but it adds the metadata to a		/// Similar to the previous function but it adds the metadata to a
/// vector of instructions.		/// vector of instructions.
void addMetadata(ArrayRef<Value > To, Instruction From);		void addMetadata(ArrayRef<Value > To, Instruction From);

		/// Collect poison-generating recipes that may generate a poison value that is
		/// used after vectorization, even when their operands are not poison. Those
		/// recipes meet the following conditions:
		/// * Contribute to the address computation of a recipe generating a widen
		/// memory load/store (VPWidenMemoryInstructionRecipe or
		/// VPInterleaveRecipe).
		/// * Such a widen memory load/store has at least one underlying Instruction
		/// that is in a basic block that needs predication and after vectorization
		/// the generated instruction won't be predicated.
		void collectPoisonGeneratingRecipes(VPTransformState &State);

/// Allow subclasses to override and print debug traces before/after vplan		/// Allow subclasses to override and print debug traces before/after vplan
/// execution, when trace information is requested.		/// execution, when trace information is requested.
virtual void printDebugTracesAtStart(){};		virtual void printDebugTracesAtStart(){};
virtual void printDebugTracesAtEnd(){};		virtual void printDebugTracesAtEnd(){};

/// The original loop.		/// The original loop.
Loop *OrigLoop;		Loop *OrigLoop;

▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	protected:
bool OptForSizeBasedOnProfile;		bool OptForSizeBasedOnProfile;

/// Structure to hold information about generated runtime checks, responsible		/// Structure to hold information about generated runtime checks, responsible
/// for cleaning the checks, if vectorization turns out unprofitable.		/// for cleaning the checks, if vectorization turns out unprofitable.
GeneratedRTChecks &RTChecks;		GeneratedRTChecks &RTChecks;
};		};

class InnerLoopUnroller : public InnerLoopVectorizer {		class InnerLoopUnroller : public InnerLoopVectorizer {
public:		public:
		fhahnUnsubmitted Done Reply Inline Actions This is only used during codegen, right? In that case, it would probably be better to just add it to VPTransformState instead. Otherwise it just adds a little bit of additional coupling between ILV and codegen, making it harder to separate in the future. fhahn: This is only used during codegen, right? In that case, it would probably be better to just add…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Sure, will do! dcaballe: Sure, will do!
InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,		InnerLoopUnroller(Loop *OrigLoop, PredicatedScalarEvolution &PSE,
LoopInfo LI, DominatorTree DT,		LoopInfo LI, DominatorTree DT,
const TargetLibraryInfo *TLI,		const TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, AssumptionCache AC,		const TargetTransformInfo TTI, AssumptionCache AC,
OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,		OptimizationRemarkEmitter *ORE, unsigned UnrollFactor,
LoopVectorizationLegality *LVL,		LoopVectorizationLegality *LVL,
LoopVectorizationCostModel CM, BlockFrequencyInfo BFI,		LoopVectorizationCostModel CM, BlockFrequencyInfo BFI,
ProfileSummaryInfo *PSI, GeneratedRTChecks &Check)		ProfileSummaryInfo *PSI, GeneratedRTChecks &Check)
▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
void InnerLoopVectorizer::addNewMetadata(Instruction *To,		void InnerLoopVectorizer::addNewMetadata(Instruction *To,
const Instruction *Orig) {		const Instruction *Orig) {
// If the loop was versioned with memchecks, add the corresponding no-alias		// If the loop was versioned with memchecks, add the corresponding no-alias
// metadata.		// metadata.
if (LVer && (isa<LoadInst>(Orig) \|\| isa<StoreInst>(Orig)))		if (LVer && (isa<LoadInst>(Orig) \|\| isa<StoreInst>(Orig)))
LVer->annotateInstWithNoAlias(To, Orig);		LVer->annotateInstWithNoAlias(To, Orig);
}		}

		void InnerLoopVectorizer::collectPoisonGeneratingRecipes(
		VPTransformState &State) {

		// Collect recipes in the backward slice of `Root` that may generate a poison
		// value that is used after vectorization.
		SmallPtrSet<VPRecipeBase *, 16> Visited;
		auto collectPoisonGeneratingInstrsInBackwardSlice([&](VPRecipeBase *Root) {
		SmallVector<VPRecipeBase *, 16> Worklist;
		nlopesUnsubmitted Done Reply Inline Actions this visited set can be hoisted outside the lambda. You should only traverse each instruction at most once, as you need to drop poison flags if the instructions contributes to any address (doesn't matter which) nlopes: this visited set can be hoisted outside the lambda. You should only traverse each instruction…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions It makes sense. Will do. dcaballe: It makes sense. Will do.
		Worklist.push_back(Root);

		// Traverse the backward slice of Root through its use-def chain.
		while (!Worklist.empty()) {
		VPRecipeBase *CurRec = Worklist.back();
		Worklist.pop_back();

		if (!Visited.insert(CurRec).second)
		continue;

		// Prune search if we find another recipe generating a widen memory
		// instruction. Widen memory instructions involved in address computation
		// will lead to gather/scatter instructions, which don't need to be
		// handled.
		if (isa<VPWidenMemoryInstructionRecipe>(CurRec) \|\|
		isa<VPInterleaveRecipe>(CurRec))
		continue;

		fhahnUnsubmitted Done Reply Inline Actions I think it would be helpful to add a comment spelling out what we are looking for here. fhahn: I think it would be helpful to add a comment spelling out what we are looking for here.
		// This recipe contributes to the address computation of a widen
		// load/store. Collect recipe if its underlying instruction has
		// poison-generating flags.
		fhahnUnsubmitted Done Reply Inline Actions I'm not entirely sure why we need to check the parent region/replicator? It would be great if you could include the reasoning in the comment above, or remove if it is not needed. fhahn: I'm not entirely sure why we need to check the parent region/replicator? It would be great if…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Good catch! Actually most of these conditions are redundant or irrelevant in the new algorithm since we already know this recipe is contributing to the address computation of a VPWidenMemoryRecipe or VPInterleaveRecipe. That simplifies a lot the condition. Thanks! dcaballe: Good catch! Actually most of these conditions are redundant or irrelevant in the new algorithm…
		Instruction *Instr = CurRec->getUnderlyingInstr();
		if (Instr && cast<Operator>(Instr)->hasPoisonGeneratingFlags())
		State.MayGeneratePoisonRecipes.insert(CurRec);
		fhahnUnsubmitted Done Reply Inline Actions Is there a way to avoid querying information from legal here? As we are interested in any operation contributing to an address computation of a masked memory operation, could we only seed the worklist with addresses of masked operations instead (both VPWidenMemoryInstructionRecipe & VPInterleaveRecipe have `hasMask`/`getMask` helpers)? Then we would not need to check legal here, right? Checking `Instrs` location here might not be sufficient in any case I think. There could be cases where `Instr` is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from `Instr`. In any case, it might be good to add such a test case. fhahn: Is there a way to avoid querying information from legal here? As we are interested in any…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions As we are interested in any operation contributing to an address computation of a masked memory operation, could we only seed the worklist with addresses of masked operations instead (both VPWidenMemoryInstructionRecipe & VPInterleaveRecipe have hasMask/getMask helpers)? Then we would not need to check legal here, right? I had tried using `getMask` already. Unfortunately, there are cases where the loads are predicated but the mask has been dropped in favor of executing them speculatively unmasked. See https://reviews.llvm.org/D66688 and test `load-deref-pred.ll`. We should drop the flags from these cases as well and the only way I found is using `Legal->blockNeedsPredication`. I know it's not ideal but `Legal` is still widely used during codegen. Checking Instrs location here might not be sufficient in any case I think. There could be cases where Instr is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from Instr. In any case, it might be good to add such a test case. I think that makes sense. It would a case similar to the one that exposed the bug but moving the `subi` instruction outside the condition. I'll give it a try! dcaballe: > As we are interested in any operation contributing to an address computation of a masked…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Using `getMask` can't be used for the interleave recipe either. There are interleave groups that are formed out of non-predicated loads/stores. However, the resulting interleave group may still require a mask if the to-be-generated widen load reads more elements than those needed for the group. We should not drop poison flags in those cases. dcaballe: Using `getMask` can't be used for the interleave recipe either. There are interleave groups…
		fhahnUnsubmitted Not Done Reply Inline Actions I had tried using getMask already. Unfortunately, there are cases where the loads are predicated but the mask has been dropped in favor of executing them speculatively unmasked. See https://reviews.llvm.org/D66688 and test load-deref-pred.ll. We should drop the flags from these cases as well and the only way I found is using Legal->blockNeedsPredication. I know it's not ideal but Legal is still widely used during codegen. Hmm, that's unfortunate! There are still plenty of uses of `Legal`, but each new use we add makes things more difficult to transition. Let me take a look if there's another alternative, but if there's none it's not the end of the world for now. fhahn: > I had tried using getMask already. Unfortunately, there are cases where the loads are…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Sure! Note that `blockNeedsPredication` is used in quite a few places so all of them should be addressed in the same way. dcaballe: Sure! Note that `blockNeedsPredication` is used in quite a few places so all of them should be…

		// Add new definitions to the worklist.
		for (VPValue *operand : CurRec->operands())
		if (VPDef *OpDef = operand->getDef())
		Worklist.push_back(cast<VPRecipeBase>(OpDef));
		}
		fhahnUnsubmitted Not Done Reply Inline Actions Do we in some cases also need to drop flags from recipes other then VPReplicateRecipe? If we need to drop flags from recipes in unpredicted blocks (see comment above). then we could have cases where a vector operation (e.g. VPWidenRecipe) feeds the address computation (the first lane will get extracted). An example could be define void @drop_vector(float* noalias nocapture readonly %input, float* %output, i64* noalias %ints) local_unnamed_addr #0 { entry: br label %loop.header loop.header: %iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ] %i23 = icmp eq i64 %iv, 0 %gep = getelementptr inbounds i64, i64* %ints, i64 %iv %lv = load i64, i64* %gep %i27 = sub nuw nsw i64 %iv, 1 %r = add i64 %lv, %i27 store i64 %r, i64* %gep br i1 %i23, label %if.end, label %if.then if.then: %i29 = getelementptr inbounds float, float* %input, i64 %i27 %i30 = load float, float* %i29, align 4, !invariant.load !0 br label %if.end if.end: %i34 = phi float [ 0.000000e+00, %loop.header ], [ %i30, %if.then ] %i35 = getelementptr inbounds float, float* %output, i64 %iv store float %i34, float* %i35, align 4 %iv.inc = add nuw nsw i64 %iv, 1 %exitcond = icmp eq i64 %iv.inc, 4 br i1 %exitcond, label %loop.exit, label %loop.header loop.exit: ret void } attributes #0 = { noinline nounwind uwtable "target-features"="+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl" } !0 = !{} fhahn: Do we in some cases also need to drop flags from recipes other then VPReplicateRecipe? If we…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I assumed that address computation would be uniform for those cases and left scalar, wouldn't it?. I'll give it a try but there has been some back and forth wrt dropping the flags from vector instructions already. Based on previous feedback, I'm not sure if we should drop the flags from them when there are lanes that won't generate poison. I would need some clarification before proceeding with those changes (@nlopes). dcaballe: I assumed that address computation would be uniform for those cases and left scalar, wouldn't…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Looking closely at your example, the poison value generated by 'sub' in the first iteration of the loop is actually used in the scalar version of the code (it is stored). For that reason, my impression is that having the `nuw` flag in the input `sub` would be incorrect, isn't that the case? dcaballe: Looking closely at your example, the poison value generated by 'sub' in the first iteration of…
		nlopesUnsubmitted Not Done Reply Inline Actions I mentioned this issue before: what happens if a value was already poison in the first place? I suggest you commit this patch first, as it addresses half of the problem, and then we can discuss what's the best way to fix the second part. Just dropping flags from even all instructions within the function isn't sufficient as you may get a poison value as input (change the example above to have `[%arg, entry ]` rather than `[ 0, entry ]` for %iv. Though you only need to patch the code for the cases where you cannot prove the code would execute in the scalar version. The right fix isn't trivial. nlopes: I mentioned this issue before: what happens if a value was already poison in the first place?
		fhahnUnsubmitted Not Done Reply Inline Actions I tried reading the discussion again, but I'm not sure why it would matter whether the `sub` gets widened to a vector or not for the test case. In the test case, the issue should still be the same as in the motivating test case, the flags on the instruction can poison (some) vector lanes, independent of whether the inputs were poison originally. In the example, the `sub nuw nsw` gets widened, but similar to the other test cases, the first lane is used in the address computation of the masked memory operation. Effectively I am not sure I see the difference whether we compute only the first lane (because we scalarized the `sub`) or if we compute all lanes and use the first lane only in a UB-on-poison op and have other non-UB-on-poison uses of the full vector (storing a vector with (some) poison lanes should not be UB). If we only restrict this to VPReplicateRecipe, it seems like we can easily run into slight variations of the motivating test case where we still have the same issue as the patch is trying to fix. Now if we support non-block dropping flags from non-BB-local instruction/recipes, there are cases where we may not need to drop flags, e.g. because we can prove that poison generated by the instruction cause UB unconditionally. In those cases, I think for now our priority should be correctness, as it’s unlikely for poison-generating flags to make a notable difference for the vector loops during codegen. fhahn: I tried reading the discussion again, but I'm not sure why it would matter whether the `sub`…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Semantics on poison-generating flags on vector instructions are not clear to me, even after the discussion. I added support for those cases initially, then removed it... It should be easy to add them again but I want to make sure we agree on that before introducing any changes. If those cases need further discussion I'll be happy to address them in a separate commit. I'm not too concerned about those cases since I don't think LLVM can replace the value of a specific lane with a poison value at this point. Now if we support non-block dropping flags from non-BB-local instruction/recipes, there are cases where we may not need to drop flags, e.g. because we can prove that poison generated by the instruction cause UB unconditionally. In those cases, I think for now our priority should be correctness, as it’s unlikely for poison-generating flags to make a notable difference for the vector loops during codegen. Not sure I fully understand this part but the key point, as discussed previously, is the usage of the poison value. We may have an instruction that may generates a poison value unconditionally. It should be ok as long as the potential poison value is not used. We should drop the flags if the potential poison value happens to be used after vectorization. That's exactly what the latest implementation is doing. dcaballe: Semantics on poison-generating flags on vector instructions are not clear to me, even after the…
		});

		// Traverse all the recipes in the VPlan and collect the poison-generating
		// recipes in the backward slice starting at the address of a VPWidenRecipe or
		// VPInterleaveRecipe.
		auto Iter = depth_first(
		fhahnUnsubmitted Not Done Reply Inline Actions nit: it might be more explicit to pass the plan as argument instead of accessing it from State. fhahn: nit: it might be more explicit to pass the plan as argument instead of accessing it from State.
		dcaballeAuthorUnsubmitted Done Reply Inline Actions We are also using other fields from `State` here, like `VF` and `MayGeneratePoisonRecipes` and this is the only used of `Plan`. I'm not sure if it's worth it. dcaballe: We are also using other fields from `State` here, like `VF` and `MayGeneratePoisonRecipes` and…
		VPBlockRecursiveTraversalWrapper<VPBlockBase *>(State.Plan->getEntry()));
		for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(Iter)) {
		for (VPRecipeBase &Recipe : *VPBB) {
		if (auto *WidenRec = dyn_cast<VPWidenMemoryInstructionRecipe>(&Recipe)) {
		Instruction *UnderlyingInstr = WidenRec->getUnderlyingInstr();
		VPDef *AddrDef = WidenRec->getAddr()->getDef();
		if (AddrDef && WidenRec->isConsecutive() && UnderlyingInstr &&
		Legal->blockNeedsPredication(UnderlyingInstr->getParent()))
		collectPoisonGeneratingInstrsInBackwardSlice(
		cast<VPRecipeBase>(AddrDef));
		} else if (auto *InterleaveRec = dyn_cast<VPInterleaveRecipe>(&Recipe)) {
		VPDef *AddrDef = InterleaveRec->getAddr()->getDef();
		if (AddrDef) {
		// Check if any member of the interleave group needs predication.
		const InterleaveGroup<Instruction> *InterGroup =
		InterleaveRec->getInterleaveGroup();
		bool NeedPredication = false;
		for (int I = 0, NumMembers = InterGroup->getNumMembers();
		fhahnUnsubmitted Not Done Reply Inline Actions I think for consistency with the LLVM style used here variables should start with an upper case letter (https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly) fhahn: I think for consistency with the LLVM style used here variables should start with an upper case…
		I < NumMembers; ++I) {
		Instruction *Member = InterGroup->getMember(I);
		if (Member)
		NeedPredication \|=
		Legal->blockNeedsPredication(Member->getParent());
		}

		if (NeedPredication)
		collectPoisonGeneratingInstrsInBackwardSlice(
		cast<VPRecipeBase>(AddrDef));
		}
		}
		}
		}
		}

void InnerLoopVectorizer::addMetadata(Instruction *To,		void InnerLoopVectorizer::addMetadata(Instruction *To,
Instruction *From) {		Instruction *From) {
propagateMetadata(To, From);		propagateMetadata(To, From);
addNewMetadata(To, From);		addNewMetadata(To, From);
}		}

void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,		void InnerLoopVectorizer::addMetadata(ArrayRef<Value *> To,
Instruction *From) {		Instruction *From) {
▲ Show 20 Lines • Show All 1,853 Lines • ▼ Show 20 Lines	if (CreateGatherScatter) {
if (Reverse)		if (Reverse)
NewLI = reverseVector(NewLI);		NewLI = reverseVector(NewLI);
}		}

State.set(Def, NewLI, Part);		State.set(Def, NewLI, Part);
}		}
}		}

void InnerLoopVectorizer::scalarizeInstruction(Instruction Instr, VPValue Def,		void InnerLoopVectorizer::scalarizeInstruction(Instruction *Instr,
VPUser &User,		VPReplicateRecipe *RepRecipe,
const VPIteration &Instance,		const VPIteration &Instance,
		spatelUnsubmitted Not Done Reply Inline Actions This could be Instruction::dropPoisonGeneratingFlags() instead of new code? That also handles GEP's `inbounds` flag. If `inbounds` is also a concern with this transform, can we add a test for it? Note that `dropPoisonGeneratingFlags` has a TODO comment for FMF. spatel: This could be Instruction::dropPoisonGeneratingFlags() instead of new code? That also handles…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions That also handles GEP's inbounds flag. If inbounds is also a concern with this transform, can we add a test for it? Thanks! That makes sense, I think. The load/store will be inbounds because we are masking out out-of-bounds elements but the GEP could be out-of-bounds. This could be Instruction::dropPoisonGeneratingFlags() instead of new code? Sure. The only problem I see is extending this utility to FMF. The utility is broadly used in LLVM (ValueTracking, LoopUtils, SimplifyIndVar, InstCombine, BDCE, etc.). It might be hard to add FMF if that breaks all these passes, but we can try. dcaballe:* > That also handles GEP's inbounds flag. If inbounds is also a concern with this transform, can…
		spatelUnsubmitted Not Done Reply Inline Actions I think it's low risk - those passes are mostly dealing with integer transforms when using that call. Even if we change something by including FMF (nnan and ninf) in the list, it should be a perf regression at worst (we shouldn't break correctness by removing FMF or wrap/exact flags). spatel: I think it's low risk - those passes are mostly dealing with integer transforms when using that…
bool IfPredicateInstr,		bool IfPredicateInstr,
VPTransformState &State) {		VPTransformState &State) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");

// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for		// llvm.experimental.noalias.scope.decl intrinsics must only be duplicated for
// the first lane and part.		// the first lane and part.
if (isa<NoAliasScopeDeclInst>(Instr))		if (isa<NoAliasScopeDeclInst>(Instr))
		nlopesUnsubmitted Not Done Reply Inline Actions isn't it inefficient to recompute this information for every operation? You get a O(n^2) behavior (up to n calls of this function, each traversing up to n instructions) nlopes: isn't it inefficient to recompute this information for every operation? You get a O(n^2)…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I changed the approach so that we gather the target recipes before executing the VPlan. dcaballe: I changed the approach so that we gather the target recipes before executing the VPlan.
if (!Instance.isFirstIteration())		if (!Instance.isFirstIteration())
return;		return;

setDebugLocFromInst(Instr);		setDebugLocFromInst(Instr);

// Does this instruction return a value ?		// Does this instruction return a value ?
bool IsVoidRetTy = Instr->getType()->isVoidTy();		bool IsVoidRetTy = Instr->getType()->isVoidTy();
		nlopesUnsubmitted Done Reply Inline Actions An address computation may depend on float operations. e.g. you may have a float -> int, a select based on a float comparison, etc. nlopes: An address computation may depend on float operations. e.g. you may have a float -> int, a…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions True, removed that check. dcaballe: True, removed that check.

Instruction *Cloned = Instr->clone();		Instruction *Cloned = Instr->clone();
if (!IsVoidRetTy)		if (!IsVoidRetTy)
Cloned->setName(Instr->getName() + ".cloned");		Cloned->setName(Instr->getName() + ".cloned");

		// If the scalarized instruction contributes to the address computation of a
		// widen masked load/store which was in a basic block that needed predication
		// and is not predicated after vectorization, we can't propagate
		// poison-generating flags (nuw/nsw, exact, inbounds, etc.). The scalarized
		// instruction could feed a poison value to the base address of the widen
		// load/store.
		fhahnUnsubmitted Done Reply Inline Actions `Def` here should always by a `VPReplicateRecipe` I think, so you should be able to use `cast<>` instead. Or maybe even better update the function signature to pass a single `VPReplicateRecipe` reference instead both `VPValue Def` and `VPUser &User`. fhahn:* `Def` here should always by a `VPReplicateRecipe` I think, so you should be able to use…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Yeah, it makes sense. Thanks! dcaballe: Yeah, it makes sense. Thanks!
		if (State.MayGeneratePoisonRecipes.count(RepRecipe) > 0)
		Cloned->dropPoisonGeneratingFlags();
		lebedev.riUnsubmitted Done Reply Inline Actions What about `exact` on division? lebedev.ri: What about `exact` on division?
		lebedev.riUnsubmitted Done Reply Inline Actions (... and other FP fast-math flags) lebedev.ri: (... and other FP fast-math flags)
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Good catch, thanks! I can't think of an example that applies to FP fast-math flags. The problem happens when an instruction in a predicated block is scalarized without predicate. I can only think of address computation cases that would fall into that category. Do you have any other case in mind? I'll try to write a test for the division case and I can follow up on any other cases we find separately. This is really blocking internal work. Does it sound reasonable? dcaballe: Good catch, thanks! I can't think of an example that applies to FP fast-math flags. The problem…

State.Builder.SetInsertPoint(Builder.GetInsertBlock(),		State.Builder.SetInsertPoint(Builder.GetInsertBlock(),
		nlopesUnsubmitted Done Reply Inline Actions you don't need to consider users of loads (nor of stores). you can break here if you want if WidenMemRec != nullptr. nlopes: you don't need to consider users of loads (nor of stores). you can break here if you want if…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions It makes sense. Done. Updating doc accordingly. dcaballe: It makes sense. Done. Updating doc accordingly.
Builder.GetInsertPoint());		Builder.GetInsertPoint());
// Replace the operands of the cloned instructions with their scalar		// Replace the operands of the cloned instructions with their scalar
// equivalents in the new loop.		// equivalents in the new loop.
for (unsigned op = 0, e = User.getNumOperands(); op != e; ++op) {		for (unsigned op = 0, e = RepRecipe->getNumOperands(); op != e; ++op) {
auto *Operand = dyn_cast<Instruction>(Instr->getOperand(op));		auto *Operand = dyn_cast<Instruction>(Instr->getOperand(op));
auto InputInstance = Instance;		auto InputInstance = Instance;
if (!Operand \|\| !OrigLoop->contains(Operand) \|\|		if (!Operand \|\| !OrigLoop->contains(Operand) \|\|
(Cost->isUniformAfterVectorization(Operand, State.VF)))		(Cost->isUniformAfterVectorization(Operand, State.VF)))
InputInstance.Lane = VPLane::getFirstLane();		InputInstance.Lane = VPLane::getFirstLane();
auto *NewOp = State.get(User.getOperand(op), InputInstance);		auto *NewOp = State.get(RepRecipe->getOperand(op), InputInstance);
Cloned->setOperand(op, NewOp);		Cloned->setOperand(op, NewOp);
}		}
addNewMetadata(Cloned, Instr);		addNewMetadata(Cloned, Instr);

// Place the cloned scalar in the new loop.		// Place the cloned scalar in the new loop.
Builder.Insert(Cloned);		Builder.Insert(Cloned);

State.set(Def, Cloned, Instance);		State.set(RepRecipe, Cloned, Instance);

// If we just cloned a new assumption, add it the assumption cache.		// If we just cloned a new assumption, add it the assumption cache.
if (auto *II = dyn_cast<AssumeInst>(Cloned))		if (auto *II = dyn_cast<AssumeInst>(Cloned))
AC->registerAssumption(II);		AC->registerAssumption(II);

// End if-block.		// End if-block.
if (IfPredicateInstr)		if (IfPredicateInstr)
PredicatedInstructions.push_back(Cloned);		PredicatedInstructions.push_back(Cloned);
▲ Show 20 Lines • Show All 1,516 Lines • ▼ Show 20 Lines	for (PHINode *OrigPhi : OrigPHIsToFix) {
}		}
}		}
}		}

bool InnerLoopVectorizer::useOrderedReductions(RecurrenceDescriptor &RdxDesc) {		bool InnerLoopVectorizer::useOrderedReductions(RecurrenceDescriptor &RdxDesc) {
return Cost->useOrderedReductions(RdxDesc);		return Cost->useOrderedReductions(RdxDesc);
}		}

void InnerLoopVectorizer::widenGEP(GetElementPtrInst GEP, VPValue VPDef,		void InnerLoopVectorizer::widenGEP(GetElementPtrInst *GEP,
		VPWidenGEPRecipe *WidenGEPRec,
VPUser &Operands, unsigned UF,		VPUser &Operands, unsigned UF,
ElementCount VF, bool IsPtrLoopInvariant,		ElementCount VF, bool IsPtrLoopInvariant,
SmallBitVector &IsIndexLoopInvariant,		SmallBitVector &IsIndexLoopInvariant,
VPTransformState &State) {		VPTransformState &State) {
// Construct a vector GEP by widening the operands of the scalar GEP as		// Construct a vector GEP by widening the operands of the scalar GEP as
// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP		// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP
// results in a vector of pointers when at least one operand of the GEP		// results in a vector of pointers when at least one operand of the GEP
// is vector-typed. Thus, to keep the representation compact, we only use		// is vector-typed. Thus, to keep the representation compact, we only use
Show All 10 Lines	if (VF.isVector() && IsPtrLoopInvariant && IsIndexLoopInvariant.all()) {
// TODO: If at some point we decide to scalarize instructions having		// TODO: If at some point we decide to scalarize instructions having
// loop-invariant operands, this special case will no longer be		// loop-invariant operands, this special case will no longer be
// required. We would add the scalarization decision to		// required. We would add the scalarization decision to
// collectLoopScalars() and teach getVectorValue() to broadcast		// collectLoopScalars() and teach getVectorValue() to broadcast
// the lane-zero scalar value.		// the lane-zero scalar value.
auto *Clone = Builder.Insert(GEP->clone());		auto *Clone = Builder.Insert(GEP->clone());
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *EntryPart = Builder.CreateVectorSplat(VF, Clone);		Value *EntryPart = Builder.CreateVectorSplat(VF, Clone);
State.set(VPDef, EntryPart, Part);		State.set(WidenGEPRec, EntryPart, Part);
addMetadata(EntryPart, GEP);		addMetadata(EntryPart, GEP);
}		}
} else {		} else {
// If the GEP has at least one loop-varying operand, we are sure to		// If the GEP has at least one loop-varying operand, we are sure to
// produce a vector of pointers. But if we are only unrolling, we want		// produce a vector of pointers. But if we are only unrolling, we want
// to produce a scalar GEP for each unroll part. Thus, the GEP we		// to produce a scalar GEP for each unroll part. Thus, the GEP we
// produce with the code below will be scalar (if VF == 1) or vector		// produce with the code below will be scalar (if VF == 1) or vector
// (otherwise). Note that for the unroll-only case, we still maintain		// (otherwise). Note that for the unroll-only case, we still maintain
Show All 12 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned I = 1, E = Operands.getNumOperands(); I < E; I++) {		for (unsigned I = 1, E = Operands.getNumOperands(); I < E; I++) {
VPValue *Operand = Operands.getOperand(I);		VPValue *Operand = Operands.getOperand(I);
if (IsIndexLoopInvariant[I - 1])		if (IsIndexLoopInvariant[I - 1])
Indices.push_back(State.get(Operand, VPIteration(0, 0)));		Indices.push_back(State.get(Operand, VPIteration(0, 0)));
else		else
Indices.push_back(State.get(Operand, Part));		Indices.push_back(State.get(Operand, Part));
}		}

		// If the GEP instruction is vectorized and was in a basic block that
		// needed predication, we can't propagate the poison-generating 'inbounds'
		// flag. The control flow has been linearized and the GEP is no longer
		// guarded by the predicate, which could make the 'inbounds' properties to
		// no longer hold.
		bool IsInBounds = GEP->isInBounds() &&
		State.MayGeneratePoisonRecipes.count(WidenGEPRec) == 0;

// Create the new GEP. Note that this GEP may be a scalar if VF == 1,		// Create the new GEP. Note that this GEP may be a scalar if VF == 1,
// but it should be a vector, otherwise.		// but it should be a vector, otherwise.
auto *NewGEP =		auto *NewGEP =
GEP->isInBounds()		IsInBounds
? Builder.CreateInBoundsGEP(GEP->getSourceElementType(), Ptr,		? Builder.CreateInBoundsGEP(GEP->getSourceElementType(), Ptr,
Indices)		Indices)
: Builder.CreateGEP(GEP->getSourceElementType(), Ptr, Indices);		: Builder.CreateGEP(GEP->getSourceElementType(), Ptr, Indices);
assert((VF.isScalar() \|\| NewGEP->getType()->isVectorTy()) &&		assert((VF.isScalar() \|\| NewGEP->getType()->isVectorTy()) &&
"NewGEP is not a pointer vector");		"NewGEP is not a pointer vector");
State.set(VPDef, NewGEP, Part);		State.set(WidenGEPRec, NewGEP, Part);
addMetadata(NewGEP, GEP);		addMetadata(NewGEP, GEP);
}		}
}		}
}		}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN,		void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN,
VPWidenPHIRecipe *PhiR,		VPWidenPHIRecipe *PhiR,
VPTransformState &State) {		VPTransformState &State) {
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	assert((I.getOpcode() == Instruction::UDiv \|\|
I.getOpcode() == Instruction::URem \|\|		I.getOpcode() == Instruction::URem \|\|
I.getOpcode() == Instruction::SRem) &&		I.getOpcode() == Instruction::SRem) &&
"Unexpected instruction");		"Unexpected instruction");
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::widenInstruction(Instruction &I, VPValue *Def,		void InnerLoopVectorizer::widenInstruction(Instruction &I,
VPUser &User,		VPWidenRecipe *WidenRec,
VPTransformState &State) {		VPTransformState &State) {
switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Call:		case Instruction::Call:
case Instruction::Br:		case Instruction::Br:
case Instruction::PHI:		case Instruction::PHI:
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
case Instruction::Select:		case Instruction::Select:
llvm_unreachable("This instruction is handled by a different recipe.");		llvm_unreachable("This instruction is handled by a different recipe.");
Show All 16 Lines	void InnerLoopVectorizer::widenInstruction(Instruction &I,
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
// Just widen unops and binops.		// Just widen unops and binops.
setDebugLocFromInst(&I);		setDebugLocFromInst(&I);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Value *, 2> Ops;		SmallVector<Value *, 2> Ops;
for (VPValue *VPOp : User.operands())		for (VPValue *VPOp : WidenRec->operands())
Ops.push_back(State.get(VPOp, Part));		Ops.push_back(State.get(VPOp, Part));

Value *V = Builder.CreateNAryOp(I.getOpcode(), Ops);		Value *V = Builder.CreateNAryOp(I.getOpcode(), Ops);

if (auto *VecOp = dyn_cast<Instruction>(V))		if (auto *VecOp = dyn_cast<Instruction>(V)) {
VecOp->copyIRFlags(&I);		VecOp->copyIRFlags(&I);

		// If the instruction is vectorized and was in a basic block that needed
		// predication, we can't propagate poison-generating flags (nuw/nsw,
		// exact, etc.). The control flow has been linearized and the
		// instruction is no longer guarded by the predicate, which could make
		// the flag properties to no longer hold.
		if (State.MayGeneratePoisonRecipes.count(WidenRec) > 0)
		VecOp->dropPoisonGeneratingFlags();
		}

// Use this vector value for all users of the original instruction.		// Use this vector value for all users of the original instruction.
State.set(Def, V, Part);		State.set(WidenRec, V, Part);
addMetadata(V, &I);		addMetadata(V, &I);
}		}

break;		break;
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
// Widen compares. Generate vector compares.		// Widen compares. Generate vector compares.
bool FCmp = (I.getOpcode() == Instruction::FCmp);		bool FCmp = (I.getOpcode() == Instruction::FCmp);
auto *Cmp = cast<CmpInst>(&I);		auto *Cmp = cast<CmpInst>(&I);
setDebugLocFromInst(Cmp);		setDebugLocFromInst(Cmp);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *A = State.get(User.getOperand(0), Part);		Value *A = State.get(WidenRec->getOperand(0), Part);
Value *B = State.get(User.getOperand(1), Part);		Value *B = State.get(WidenRec->getOperand(1), Part);
Value *C = nullptr;		Value *C = nullptr;
if (FCmp) {		if (FCmp) {
// Propagate fast math flags.		// Propagate fast math flags.
IRBuilder<>::FastMathFlagGuard FMFG(Builder);		IRBuilder<>::FastMathFlagGuard FMFG(Builder);
Builder.setFastMathFlags(Cmp->getFastMathFlags());		Builder.setFastMathFlags(Cmp->getFastMathFlags());
C = Builder.CreateFCmp(Cmp->getPredicate(), A, B);		C = Builder.CreateFCmp(Cmp->getPredicate(), A, B);
} else {		} else {
C = Builder.CreateICmp(Cmp->getPredicate(), A, B);		C = Builder.CreateICmp(Cmp->getPredicate(), A, B);
}		}
State.set(Def, C, Part);		State.set(WidenRec, C, Part);
addMetadata(C, &I);		addMetadata(C, &I);
}		}

break;		break;
}		}

case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
Show All 10 Lines	case Instruction::BitCast: {
auto *CI = cast<CastInst>(&I);		auto *CI = cast<CastInst>(&I);
setDebugLocFromInst(CI);		setDebugLocFromInst(CI);

/// Vectorize casts.		/// Vectorize casts.
Type *DestTy =		Type *DestTy =
(VF.isScalar()) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF.isScalar()) ? CI->getType() : VectorType::get(CI->getType(), VF);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *A = State.get(User.getOperand(0), Part);		Value *A = State.get(WidenRec->getOperand(0), Part);
Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);		Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);
State.set(Def, Cast, Part);		State.set(WidenRec, Cast, Part);
addMetadata(Cast, &I);		addMetadata(Cast, &I);
}		}
break;		break;
}		}
default:		default:
// This instruction is not vectorized by simple widening.		// This instruction is not vectorized by simple widening.
LLVM_DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);		LLVM_DEBUG(dbgs() << "LV: Found an unhandled instruction: " << I);
llvm_unreachable("Unhandled instruction!");		llvm_unreachable("Unhandled instruction!");
▲ Show 20 Lines • Show All 3,289 Lines • ▼ Show 20 Lines	void LoopVectorizationPlanner::executePlan(ElementCount BestVF, unsigned BestUF,

// Perform the actual loop transformation.		// Perform the actual loop transformation.

// 1. Create a new empty loop. Unlink the old loop and connect the new one.		// 1. Create a new empty loop. Unlink the old loop and connect the new one.
VPTransformState State{BestVF, BestUF, LI, DT, ILV.Builder, &ILV, &BestVPlan};		VPTransformState State{BestVF, BestUF, LI, DT, ILV.Builder, &ILV, &BestVPlan};
State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();		State.CFG.PrevBB = ILV.createVectorizedLoopSkeleton();
State.TripCount = ILV.getOrCreateTripCount(nullptr);		State.TripCount = ILV.getOrCreateTripCount(nullptr);
State.CanonicalIV = ILV.Induction;		State.CanonicalIV = ILV.Induction;
		ILV.collectPoisonGeneratingRecipes(State);

ILV.printDebugTracesAtStart();		ILV.printDebugTracesAtStart();

//===------------------------------------------------===//		//===------------------------------------------------===//
//		//
// Notice: any optimization or new instruction that go		// Notice: any optimization or new instruction that go
// into the code below should also be implemented in		// into the code below should also be implemented in
// the cost-model.		// the cost-model.
▲ Show 20 Lines • Show All 1,477 Lines • ▼ Show 20 Lines
}		}

void VPWidenSelectRecipe::execute(VPTransformState &State) {		void VPWidenSelectRecipe::execute(VPTransformState &State) {
State.ILV->widenSelectInstruction(*cast<SelectInst>(getUnderlyingInstr()),		State.ILV->widenSelectInstruction(*cast<SelectInst>(getUnderlyingInstr()),
this, *this, InvariantCond, State);		this, *this, InvariantCond, State);
}		}

void VPWidenRecipe::execute(VPTransformState &State) {		void VPWidenRecipe::execute(VPTransformState &State) {
State.ILV->widenInstruction(getUnderlyingInstr(), this, this, State);		State.ILV->widenInstruction(*getUnderlyingInstr(), this, State);
}		}

void VPWidenGEPRecipe::execute(VPTransformState &State) {		void VPWidenGEPRecipe::execute(VPTransformState &State) {
State.ILV->widenGEP(cast<GetElementPtrInst>(getUnderlyingInstr()), this,		State.ILV->widenGEP(cast<GetElementPtrInst>(getUnderlyingInstr()), this,
*this, State.UF, State.VF, IsPtrLoopInvariant,		*this, State.UF, State.VF, IsPtrLoopInvariant,
IsIndexLoopInvariant, State);		IsIndexLoopInvariant, State);
}		}

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	else
PrevInChain);		PrevInChain);
State.set(this, NextInChain, Part);		State.set(this, NextInChain, Part);
}		}
}		}

void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");		assert(!State.VF.isScalable() && "Can't scalarize a scalable vector");
State.ILV->scalarizeInstruction(getUnderlyingInstr(), this, *this,		State.ILV->scalarizeInstruction(getUnderlyingInstr(), this, *State.Instance,
*State.Instance, IsPredicated, State);		IsPredicated, State);
// Insert scalar instance packing it into a vector.		// Insert scalar instance packing it into a vector.
if (AlsoPack && State.VF.isVector()) {		if (AlsoPack && State.VF.isVector()) {
// If we're constructing lane 0, initialize to start from poison.		// If we're constructing lane 0, initialize to start from poison.
if (State.Instance->Lane.isFirstLane()) {		if (State.Instance->Lane.isFirstLane()) {
assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");		assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
Value *Poison = PoisonValue::get(		Value *Poison = PoisonValue::get(
VectorType::get(getUnderlyingValue()->getType(), State.VF));		VectorType::get(getUnderlyingValue()->getType(), State.VF));
State.set(this, Poison, State.Instance->Part);		State.set(this, Poison, State.Instance->Part);
}		}
State.ILV->packScalarIntoVectorValue(this, *State.Instance, State);		State.ILV->packScalarIntoVectorValue(this, *State.Instance, State);
}		}
return;		return;
}		}

// Generate scalar instances for all VF lanes of all UF parts, unless the		// Generate scalar instances for all VF lanes of all UF parts, unless the
// instruction is uniform inwhich case generate only the first lane for each		// instruction is uniform inwhich case generate only the first lane for each
// of the UF parts.		// of the UF parts.
unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();		unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();
assert((!State.VF.isScalable() \|\| IsUniform) &&		assert((!State.VF.isScalable() \|\| IsUniform) &&
"Can't scalarize a scalable vector");		"Can't scalarize a scalable vector");
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
for (unsigned Lane = 0; Lane < EndLane; ++Lane)		for (unsigned Lane = 0; Lane < EndLane; ++Lane)
State.ILV->scalarizeInstruction(getUnderlyingInstr(), this, *this,		State.ILV->scalarizeInstruction(getUnderlyingInstr(), this,
VPIteration(Part, Lane), IsPredicated,		VPIteration(Part, Lane), IsPredicated,
State);		State);
}		}

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");		assert(State.Instance && "Branch on Mask works only on single instance.");

unsigned Part = State.Instance->Part;		unsigned Part = State.Instance->Part;
▲ Show 20 Lines • Show All 788 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
class InnerLoopVectorizer;		class InnerLoopVectorizer;
class LoopInfo;		class LoopInfo;
class raw_ostream;		class raw_ostream;
class RecurrenceDescriptor;		class RecurrenceDescriptor;
class Value;		class Value;
class VPBasicBlock;		class VPBasicBlock;
class VPRegionBlock;		class VPRegionBlock;
class VPlan;		class VPlan;
		class VPReplicateRecipe;
class VPlanSlp;		class VPlanSlp;

/// Returns a calculation for the total number of elements for a given \p VF.		/// Returns a calculation for the total number of elements for a given \p VF.
/// For fixed width vectors this value is a constant, whereas for scalable		/// For fixed width vectors this value is a constant, whereas for scalable
/// vectors it is an expression determined at runtime.		/// vectors it is an expression determined at runtime.
Value getRuntimeVF(IRBuilder<> &B, Type Ty, ElementCount VF);		Value getRuntimeVF(IRBuilder<> &B, Type Ty, ElementCount VF);

/// A range of powers-of-2 vectorization factors with fixed start and		/// A range of powers-of-2 vectorization factors with fixed start and
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	struct VPTransformState {
/// Hold the trip count of the scalar loop.		/// Hold the trip count of the scalar loop.
Value *TripCount = nullptr;		Value *TripCount = nullptr;

/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.		/// Hold a pointer to InnerLoopVectorizer to reuse its IR generation methods.
InnerLoopVectorizer *ILV;		InnerLoopVectorizer *ILV;

/// Pointer to the VPlan code is generated for.		/// Pointer to the VPlan code is generated for.
VPlan *Plan;		VPlan *Plan;

		/// Holds recipes that may generate a poison value that is used after
		/// vectorization, even when their operands are not poison.
		SmallPtrSet<VPRecipeBase *, 16> MayGeneratePoisonRecipes;
};		};

/// VPUsers instance used by VPBlockBase to manage CondBit and the block		/// VPUsers instance used by VPBlockBase to manage CondBit and the block
/// predicate. Currently VPBlockUsers are used in VPBlockBase for historical		/// predicate. Currently VPBlockUsers are used in VPBlockBase for historical
/// reasons, but in the future the only VPUsers should either be recipes or		/// reasons, but in the future the only VPUsers should either be recipes or
/// live-outs.VPBlockBase uses.		/// live-outs.VPBlockBase uses.
struct VPBlockUser : public VPUser {		struct VPBlockUser : public VPUser {
VPBlockUser() : VPUser({}, VPUserID::Block) {}		VPBlockUser() : VPUser({}, VPUserID::Block) {}
▲ Show 20 Lines • Show All 1,149 Lines • ▼ Show 20 Lines
};		};

/// A Recipe for widening load/store operations.		/// A Recipe for widening load/store operations.
/// The recipe uses the following VPValues:		/// The recipe uses the following VPValues:
/// - For load: Address, optional mask		/// - For load: Address, optional mask
/// - For store: Address, stored value, optional mask		/// - For store: Address, stored value, optional mask
/// TODO: We currently execute only per-part unless a specific instance is		/// TODO: We currently execute only per-part unless a specific instance is
/// provided.		/// provided.
class VPWidenMemoryInstructionRecipe : public VPRecipeBase {		class VPWidenMemoryInstructionRecipe : public VPRecipeBase, public VPValue {
Instruction &Ingredient;		Instruction &Ingredient;

// Whether the loaded-from / stored-to addresses are consecutive.		// Whether the loaded-from / stored-to addresses are consecutive.
bool Consecutive;		bool Consecutive;

// Whether the consecutive loaded/stored addresses are in reverse order.		// Whether the consecutive loaded/stored addresses are in reverse order.
bool Reverse;		bool Reverse;

void setMask(VPValue *Mask) {		void setMask(VPValue *Mask) {
if (!Mask)		if (!Mask)
return;		return;
addOperand(Mask);		addOperand(Mask);
}		}

bool isMasked() const {		bool isMasked() const {
return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;		return isStore() ? getNumOperands() == 3 : getNumOperands() == 2;
}		}

public:		public:
VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue Addr, VPValue Mask,		VPWidenMemoryInstructionRecipe(LoadInst &Load, VPValue Addr, VPValue Mask,
bool Consecutive, bool Reverse)		bool Consecutive, bool Reverse)
: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr}), Ingredient(Load),		: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr}),
		VPValue(VPValue::VPVMemoryInstructionSC, &Load, this), Ingredient(Load),
Consecutive(Consecutive), Reverse(Reverse) {		Consecutive(Consecutive), Reverse(Reverse) {
assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");		assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");
new VPValue(VPValue::VPVMemoryInstructionSC, &Load, this);
setMask(Mask);		setMask(Mask);
}		}

VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,		VPWidenMemoryInstructionRecipe(StoreInst &Store, VPValue *Addr,
VPValue StoredValue, VPValue Mask,		VPValue StoredValue, VPValue Mask,
bool Consecutive, bool Reverse)		bool Consecutive, bool Reverse)
: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr, StoredValue}),		: VPRecipeBase(VPWidenMemoryInstructionSC, {Addr, StoredValue}),
		VPValue(VPValue::VPVMemoryInstructionSC, &Store, this),
Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse) {		Ingredient(Store), Consecutive(Consecutive), Reverse(Reverse) {
assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");		assert((Consecutive \|\| !Reverse) && "Reverse implies consecutive");
setMask(Mask);		setMask(Mask);
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPDef *D) {		static inline bool classof(const VPDef *D) {
return D->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;		return D->getVPDefID() == VPRecipeBase::VPWidenMemoryInstructionSC;
▲ Show 20 Lines • Show All 1,031 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-masked-loadstore.ll

; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -mattr=+sve -S %s -scalable-vectorization=on -o - \| FileCheck %s		; RUN: opt -loop-vectorize -dce -instcombine -mtriple aarch64-linux-gnu -mattr=+sve -S %s -scalable-vectorization=on -o - \| FileCheck %s

define void @mloadstore_f32(float* noalias nocapture %a, float* noalias nocapture readonly %b, i64 %n) {		define void @mloadstore_f32(float* noalias nocapture %a, float* noalias nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @mloadstore_f32		; CHECK-LABEL: @mloadstore_f32
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[LOAD1:.]] = load <vscale x 4 x float>, <vscale x 4 x float>		; CHECK: %[[LOAD1:.]] = load <vscale x 4 x float>, <vscale x 4 x float>
; CHECK-NEXT: %[[MASK:.*]] = fcmp ogt <vscale x 4 x float> %[[LOAD1]],		; CHECK-NEXT: %[[MASK:.*]] = fcmp ogt <vscale x 4 x float> %[[LOAD1]],
; CHECK-NEXT: %[[GEPA:.]] = getelementptr inbounds float, float %a,		; CHECK-NEXT: %[[GEPA:.]] = getelementptr float, float %a,
; CHECK-NEXT: %[[MLOAD_PTRS:.]] = bitcast float %[[GEPA]] to <vscale x 4 x float>*		; CHECK-NEXT: %[[MLOAD_PTRS:.]] = bitcast float %[[GEPA]] to <vscale x 4 x float>*
; CHECK-NEXT: %[[LOAD2:.]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float> %[[MLOAD_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]]		; CHECK-NEXT: %[[LOAD2:.]] = call <vscale x 4 x float> @llvm.masked.load.nxv4f32.p0nxv4f32(<vscale x 4 x float> %[[MLOAD_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]]
; CHECK-NEXT: %[[FADD:.*]] = fadd <vscale x 4 x float> %[[LOAD1]], %[[LOAD2]]		; CHECK-NEXT: %[[FADD:.*]] = fadd <vscale x 4 x float> %[[LOAD1]], %[[LOAD2]]
; CHECK-NEXT: %[[MSTORE_PTRS:.]] = bitcast float %[[GEPA]] to <vscale x 4 x float>*		; CHECK-NEXT: %[[MSTORE_PTRS:.]] = bitcast float %[[GEPA]] to <vscale x 4 x float>*
; CHECK-NEXT: call void @llvm.masked.store.nxv4f32.p0nxv4f32(<vscale x 4 x float> %[[FADD]], <vscale x 4 x float>* %[[MSTORE_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]])		; CHECK-NEXT: call void @llvm.masked.store.nxv4f32.p0nxv4f32(<vscale x 4 x float> %[[FADD]], <vscale x 4 x float>* %[[MSTORE_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]])
entry:		entry:
br label %for.body		br label %for.body

Show All 20 Lines	exit: ; preds = %for.inc
ret void		ret void
}		}

define void @mloadstore_i32(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {		define void @mloadstore_i32(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @mloadstore_i32		; CHECK-LABEL: @mloadstore_i32
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[LOAD1:.]] = load <vscale x 4 x i32>, <vscale x 4 x i32>		; CHECK: %[[LOAD1:.]] = load <vscale x 4 x i32>, <vscale x 4 x i32>
; CHECK-NEXT: %[[MASK:.*]] = icmp ne <vscale x 4 x i32> %[[LOAD1]],		; CHECK-NEXT: %[[MASK:.*]] = icmp ne <vscale x 4 x i32> %[[LOAD1]],
; CHECK-NEXT: %[[GEPA:.]] = getelementptr inbounds i32, i32 %a,		; CHECK-NEXT: %[[GEPA:.]] = getelementptr i32, i32 %a,
; CHECK-NEXT: %[[MLOAD_PTRS:.]] = bitcast i32 %[[GEPA]] to <vscale x 4 x i32>*		; CHECK-NEXT: %[[MLOAD_PTRS:.]] = bitcast i32 %[[GEPA]] to <vscale x 4 x i32>*
; CHECK-NEXT: %[[LOAD2:.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> %[[MLOAD_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]]		; CHECK-NEXT: %[[LOAD2:.]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0nxv4i32(<vscale x 4 x i32> %[[MLOAD_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]]
; CHECK-NEXT: %[[FADD:.*]] = add <vscale x 4 x i32> %[[LOAD1]], %[[LOAD2]]		; CHECK-NEXT: %[[FADD:.*]] = add <vscale x 4 x i32> %[[LOAD1]], %[[LOAD2]]
; CHECK-NEXT: %[[MSTORE_PTRS:.]] = bitcast i32 %[[GEPA]] to <vscale x 4 x i32>*		; CHECK-NEXT: %[[MSTORE_PTRS:.]] = bitcast i32 %[[GEPA]] to <vscale x 4 x i32>*
; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> %[[FADD]], <vscale x 4 x i32>* %[[MSTORE_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]])		; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0nxv4i32(<vscale x 4 x i32> %[[FADD]], <vscale x 4 x i32>* %[[MSTORE_PTRS]], i32 4, <vscale x 4 x i1> %[[MASK]])
entry:		entry:
br label %for.body		br label %for.body

Show All 29 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll

	; This is the loop in c++ being vectorize in this file with			; This is the loop in c++ being vectorize in this file with
	; experimental.vector.reverse			; experimental.vector.reverse

	;#pragma clang loop vectorize_width(4, scalable)			;#pragma clang loop vectorize_width(4, scalable)
	; for (long int i = N - 1; i >= 0; i--)			; for (long int i = N - 1; i >= 0; i--)
	; {			; {
	; if (cond[i])			; if (cond[i])
	; a[i] += 1;			; a[i] += 1;
	; }			; }

	; The test checks if the mask is being correctly created, reverted and used			; The test checks if the mask is being correctly created, reverted and used

	; RUN: opt -loop-vectorize -scalable-vectorization=on -dce -instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s			; RUN: opt -loop-vectorize -scalable-vectorization=on -dce -instcombine -mtriple aarch64-linux-gnu -S < %s \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	define void @vector_reverse_mask_nxv4i1(double* %a, double* %cond, i64 %N) #0 {			define void @vector_reverse_mask_nxv4i1(double* %a, double* %cond, i64 %N) #0 {
	; CHECK-LABEL: vector.body:			; CHECK-LABEL: vector.body:
	; CHECK: %[[REVERSE6:.]] = call <vscale x 4 x i1> @llvm.experimental.vector.reverse.nxv4i1(<vscale x 4 x i1> %{{.}})			; CHECK: %[[REVERSE6:.]] = call <vscale x 4 x i1> @llvm.experimental.vector.reverse.nxv4i1(<vscale x 4 x i1> %{{.}})
	; CHECK: %[[WIDEMSKLOAD:.]] = call <vscale x 4 x double> @llvm.masked.load.nxv4f64.p0nxv4f64(<vscale x 4 x double> nonnull %{{.*}}, i32 8, <vscale x 4 x i1> %[[REVERSE6]], <vscale x 4 x double> poison)			; CHECK: %[[WIDEMSKLOAD:.]] = call <vscale x 4 x double> @llvm.masked.load.nxv4f64.p0nxv4f64(<vscale x 4 x double> %{{.*}}, i32 8, <vscale x 4 x i1> %[[REVERSE6]], <vscale x 4 x double> poison)
	; CHECK-NEXT: %[[FADD:.*]] = fadd <vscale x 4 x double> %[[WIDEMSKLOAD]]			; CHECK-NEXT: %[[FADD:.*]] = fadd <vscale x 4 x double> %[[WIDEMSKLOAD]]
	; CHECK: %[[REVERSE9:.]] = call <vscale x 4 x i1> @llvm.experimental.vector.reverse.nxv4i1(<vscale x 4 x i1> %{{.}})			; CHECK: %[[REVERSE9:.]] = call <vscale x 4 x i1> @llvm.experimental.vector.reverse.nxv4i1(<vscale x 4 x i1> %{{.}})
	; CHECK: call void @llvm.masked.store.nxv4f64.p0nxv4f64(<vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %{{.*}}, i32 8, <vscale x 4 x i1> %[[REVERSE9]]			; CHECK: call void @llvm.masked.store.nxv4f64.p0nxv4f64(<vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %{{.*}}, i32 8, <vscale x 4 x i1> %[[REVERSE9]]

	entry:			entry:
	%cmp7 = icmp sgt i64 %N, 0			%cmp7 = icmp sgt i64 %N, 0
	br i1 %cmp7, label %for.body, label %for.cond.cleanup			br i1 %cmp7, label %for.body, label %for.cond.cleanup

	Show All 31 Lines

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE:%.*]] = shufflevector <4 x double> [[WIDE_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -4			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[TMP2]], i64 -4
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP5]], i64 -3			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds double, double [[TMP5]], i64 -3
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[TMP6]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !alias.scope !0			; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x double>, <4 x double> [[TMP7]], align 8, !alias.scope !0
	; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE7:%.*]] = shufflevector <4 x double> [[WIDE_LOAD6]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer			; CHECK-NEXT: [[TMP8:%.*]] = fcmp une <4 x double> [[REVERSE]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fcmp une <4 x double> [[REVERSE7]], zeroinitializer
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP10:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds double, double [[TMP10]], i64 -3			; CHECK-NEXT: [[TMP11:%.]] = getelementptr double, double [[TMP10]], i64 -3
	; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE8:%.*]] = shufflevector <4 x i1> [[TMP8]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> nonnull [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP12]], i32 8, <4 x i1> [[REVERSE8]], <4 x double> poison), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds double, double [[TMP10]], i64 -4			; CHECK-NEXT: [[TMP13:%.]] = getelementptr double, double [[TMP10]], i64 -4
	; CHECK-NEXT: [[TMP14:%.]] = getelementptr inbounds double, double [[TMP13]], i64 -3			; CHECK-NEXT: [[TMP14:%.]] = getelementptr double, double [[TMP13]], i64 -3
	; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[REVERSE10:%.*]] = shufflevector <4 x i1> [[TMP9]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[TMP14]] to <4 x double>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> nonnull [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0			; CHECK-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP15]], i32 8, <4 x i1> [[REVERSE10]], <4 x double> poison), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP16:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP17:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD11]], <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*			; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[TMP11]] to <4 x double>*
	; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0			; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP16]], <4 x double>* [[TMP18]], i32 8, <4 x i1> [[REVERSE8]]), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*			; CHECK-NEXT: [[TMP19:%.]] = bitcast double [[TMP14]] to <4 x double>*
	; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP17]], <4 x double>* [[TMP19]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0			; CHECK-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP17]], <4 x double>* [[TMP19]], i32 8, <4 x i1> [[REVERSE10]]), !alias.scope !3, !noalias !0
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll

Show All 20 Lines
define void @drop_scalar_nuw_nsw(float* noalias nocapture readonly %input,		define void @drop_scalar_nuw_nsw(float* noalias nocapture readonly %input,
float* %output) local_unnamed_addr #0 {		float* %output) local_unnamed_addr #0 {
; CHECK-LABEL: @drop_scalar_nuw_nsw(		; CHECK-LABEL: @drop_scalar_nuw_nsw(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]		; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK: [[TMP4:%.*]] = icmp eq <4 x i64> [[VEC_IND]], zeroinitializer		; CHECK: [[TMP4:%.*]] = icmp eq <4 x i64> [[VEC_IND]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = sub nuw nsw i64 [[TMP0]], 1		; CHECK-NEXT: [[TMP5:%.*]] = sub i64 [[TMP0]], 1
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[INPUT:%.*]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr float, float [[INPUT:%.*]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>		; CHECK-NEXT: [[TMP7:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>
; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP6]], i32 0		; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[TMP6]], i32 0
; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*		; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*
; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP9]], i32 4, <4 x i1> [[TMP7]], <4 x float> poison), !invariant.load !0		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP9]], i32 4, <4 x i1> [[TMP7]], <4 x float> poison), !invariant.load !0
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]		%iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]
%i23 = icmp eq i64 %iv, 0		%i23 = icmp eq i64 %iv, 0
Show All 21 Lines
; In this case, 'sub' and 'getelementptr' are not guarded by the predicate.		; In this case, 'sub' and 'getelementptr' are not guarded by the predicate.
define void @drop_nonpred_scalar_nuw_nsw(float* noalias nocapture readonly %input,		define void @drop_nonpred_scalar_nuw_nsw(float* noalias nocapture readonly %input,
float* %output) local_unnamed_addr #0 {		float* %output) local_unnamed_addr #0 {
; CHECK-LABEL: @drop_nonpred_scalar_nuw_nsw(		; CHECK-LABEL: @drop_nonpred_scalar_nuw_nsw(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]		; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK: [[TMP5:%.*]] = sub nuw nsw i64 [[TMP0]], 1		; CHECK: [[TMP5:%.*]] = sub i64 [[TMP0]], 1
; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[INPUT:%.*]], i64 [[TMP5]]		; CHECK-NEXT: [[TMP6:%.]] = getelementptr float, float [[INPUT:%.*]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i64> [[VEC_IND]], zeroinitializer		; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i64> [[VEC_IND]], zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>		; CHECK-NEXT: [[TMP7:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>
; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP6]], i32 0		; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[TMP6]], i32 0
; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*		; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*
; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP9]], i32 4, <4 x i1> [[TMP7]], <4 x float> poison), !invariant.load !0		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP9]], i32 4, <4 x i1> [[TMP7]], <4 x float> poison), !invariant.load !0
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]		%iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]
%i27 = sub nuw nsw i64 %iv, 1		%i27 = sub i64 %iv, 1
%i29 = getelementptr inbounds float, float* %input, i64 %i27		%i29 = getelementptr float, float* %input, i64 %i27
		fhahnUnsubmitted Not Done Reply Inline Actions Should `nuw nsw` be retained in the source here and the same for `inbounds`. It looks like they got dropped. fhahn: Should `nuw nsw` be retained in the source here and the same for `inbounds`. It looks like they…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions This is the test you asked me to add, isn't it? Checking Instrs location here might not be sufficient in any case I think. There could be cases where Instr is in a block that does not need predication, but feeds a memory instruction in a block that needs predication. In those case, I think we should also drop the flags from Instr. In any case, it might be good to add such a test case.` `sub` and `getelementptr` are not predicated but feed a load that is predicated. dcaballe: This is the test you asked me to add, isn't it? >Checking Instrs location here might not be…
%i23 = icmp eq i64 %iv, 0		%i23 = icmp eq i64 %iv, 0
br i1 %i23, label %if.end, label %if.then		br i1 %i23, label %if.end, label %if.then

if.then:		if.then:
%i30 = load float, float* %i29, align 4, !invariant.load !0		%i30 = load float, float* %i29, align 4, !invariant.load !0
br label %if.end		br label %if.end

if.end:		if.end:
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	define void @drop_vector_nuw_nsw(float* noalias nocapture readonly %input,
float* %output, float** noalias %ptrs) local_unnamed_addr #0 {		float* %output, float** noalias %ptrs) local_unnamed_addr #0 {
; CHECK-LABEL: @drop_vector_nuw_nsw(		; CHECK-LABEL: @drop_vector_nuw_nsw(
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]		; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK: [[TMP4:%.*]] = icmp eq <4 x i64> [[VEC_IND]], zeroinitializer		; CHECK: [[TMP4:%.*]] = icmp eq <4 x i64> [[VEC_IND]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float** [[PTRS:%.*]], i64 [[TMP0]]		; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float** [[PTRS:%.*]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP6:%.*]] = sub nuw nsw <4 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1>		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1>
; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds float, float [[INPUT:%.*]], <4 x i64> [[TMP6]]		; CHECK-NEXT: [[TMP7:%.]] = getelementptr float, float [[INPUT:%.*]], <4 x i64> [[TMP6]]
; CHECK: [[TMP10:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>		; CHECK: [[TMP10:%.*]] = xor <4 x i1> [[TMP4]], <i1 true, i1 true, i1 true, i1 true>
; CHECK-NEXT: [[TMP11:%.]] = extractelement <4 x float> [[TMP7]], i32 0		; CHECK-NEXT: [[TMP11:%.]] = extractelement <4 x float> [[TMP7]], i32 0
; CHECK-NEXT: [[TMP12:%.]] = getelementptr float, float [[TMP11]], i32 0		; CHECK-NEXT: [[TMP12:%.]] = getelementptr float, float [[TMP11]], i32 0
; CHECK-NEXT: [[TMP13:%.]] = bitcast float [[TMP12]] to <4 x float>*		; CHECK-NEXT: [[TMP13:%.]] = bitcast float [[TMP12]] to <4 x float>*
; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP13]], i32 4, <4 x i1> [[TMP10]], <4 x float> poison), !invariant.load !0		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP13]], i32 4, <4 x i1> [[TMP10]], <4 x float> poison), !invariant.load !0
entry:		entry:
br label %loop.header		br label %loop.header

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, {{.}} ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]		; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, {{.}} ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK: [[TMP4:%.*]] = icmp ne <4 x i64> [[VEC_IND]], zeroinitializer		; CHECK: [[TMP4:%.*]] = icmp ne <4 x i64> [[VEC_IND]], zeroinitializer
; CHECK-NEXT: [[TMP5:%.*]] = and <4 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1>		; CHECK-NEXT: [[TMP5:%.*]] = and <4 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1>
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <4 x i64> [[TMP5]], zeroinitializer		; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <4 x i64> [[TMP5]], zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = and <4 x i1> [[TMP4]], [[TMP6]]		; CHECK-NEXT: [[TMP7:%.*]] = and <4 x i1> [[TMP4]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = sdiv exact i64 [[TMP0]], 1		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i64 [[TMP0]], 1
; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds float, float [[INPUT:%.*]], i64 [[TMP8]]		; CHECK-NEXT: [[TMP9:%.]] = getelementptr float, float [[INPUT:%.*]], i64 [[TMP8]]
; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i1> [[TMP7]], <i1 true, i1 true, i1 true, i1 true>		; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i1> [[TMP7]], <i1 true, i1 true, i1 true, i1 true>
; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP9]], i32 0		; CHECK-NEXT: [[TMP11:%.]] = getelementptr float, float [[TMP9]], i32 0
; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <4 x float>*		; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <4 x float>*
; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP12]], i32 4, <4 x i1> [[TMP10]], <4 x float> poison), !invariant.load !0		; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x float> @llvm.masked.load.v4f32.p0v4f32(<4 x float> [[TMP12]], i32 4, <4 x i1> [[TMP10]], <4 x float> poison), !invariant.load !0
entry:		entry:
br label %loop.header		br label %loop.header

loop.header:		loop.header:
%iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]		%iv = phi i64 [ 0, %entry ], [ %iv.inc, %if.end ]
%i7 = icmp ne i64 %iv, 0		%i7 = icmp ne i64 %iv, 0
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll

	Show All 24 Lines
	; AVX512-NEXT: iter.check:			; AVX512-NEXT: iter.check:
	; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]			; AVX512-NEXT: br label [[VECTOR_BODY:%.*]]
	; AVX512: vector.body:			; AVX512: vector.body:
	; AVX512-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ITER_CHECK:%.]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ]			; AVX512-NEXT: [[INDEX7:%.]] = phi i64 [ 0, [[ITER_CHECK:%.]] ], [ [[INDEX_NEXT_3:%.*]], [[VECTOR_BODY]] ]
	; AVX512-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[INDEX7]]			; AVX512-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[TRIGGER:%.*]], i64 [[INDEX7]]
	; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*			; AVX512-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP1]], align 4			; AVX512-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP1]], align 4
	; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer			; AVX512-NEXT: [[TMP2:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD]], zeroinitializer
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds i32, i32 [[INDEX:%.*]], i64 [[INDEX7]]			; AVX512-NEXT: [[TMP3:%.]] = getelementptr i32, i32 [[INDEX:%.*]], i64 [[INDEX7]]
	; AVX512-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <16 x i32>*			; AVX512-NEXT: [[TMP4:%.]] = bitcast i32 [[TMP3]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP4]], i32 4, <16 x i1> [[TMP2]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64>			; AVX512-NEXT: [[TMP5:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD]] to <16 x i64>
	; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[IN:%.*]], <16 x i64> [[TMP5]]			; AVX512-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[IN:%.*]], <16 x i64> [[TMP5]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP6]], i32 4, <16 x i1> [[TMP2]], <16 x float> undef)
	; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP7:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[OUT:%.*]], i64 [[INDEX7]]			; AVX512-NEXT: [[TMP8:%.]] = getelementptr float, float [[OUT:%.*]], i64 [[INDEX7]]
	; AVX512-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <16 x float>*			; AVX512-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP7]], <16 x float>* [[TMP9]], i32 4, <16 x i1> [[TMP2]])
	; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX7]], 16			; AVX512-NEXT: [[INDEX_NEXT:%.*]] = or i64 [[INDEX7]], 16
	; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT]]			; AVX512-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT]]
	; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*			; AVX512-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD_1:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4			; AVX512-NEXT: [[WIDE_LOAD_1:%.]] = load <16 x i32>, <16 x i32> [[TMP11]], align 4
	; AVX512-NEXT: [[TMP12:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_1]], zeroinitializer			; AVX512-NEXT: [[TMP12:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_1]], zeroinitializer
	; AVX512-NEXT: [[TMP13:%.]] = getelementptr inbounds i32, i32 [[INDEX]], i64 [[INDEX_NEXT]]			; AVX512-NEXT: [[TMP13:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT]]
	; AVX512-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <16 x i32>*			; AVX512-NEXT: [[TMP14:%.]] = bitcast i32 [[TMP13]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD_1:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> nonnull [[TMP14]], i32 4, <16 x i1> [[TMP12]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD_1:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP14]], i32 4, <16 x i1> [[TMP12]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP15:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_1]] to <16 x i64>			; AVX512-NEXT: [[TMP15:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_1]] to <16 x i64>
	; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP15]]			; AVX512-NEXT: [[TMP16:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP15]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER_1:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP16]], i32 4, <16 x i1> [[TMP12]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER_1:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP16]], i32 4, <16 x i1> [[TMP12]], <16 x float> undef)
	; AVX512-NEXT: [[TMP17:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_1]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP17:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_1]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[INDEX_NEXT]]			; AVX512-NEXT: [[TMP18:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT]]
	; AVX512-NEXT: [[TMP19:%.]] = bitcast float [[TMP18]] to <16 x float>*			; AVX512-NEXT: [[TMP19:%.]] = bitcast float [[TMP18]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP17]], <16 x float>* [[TMP19]], i32 4, <16 x i1> [[TMP12]])
	; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX7]], 32			; AVX512-NEXT: [[INDEX_NEXT_1:%.*]] = or i64 [[INDEX7]], 32
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_1]]			; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_1]]
	; AVX512-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <16 x i32>*			; AVX512-NEXT: [[TMP21:%.]] = bitcast i32 [[TMP20]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD_2:%.]] = load <16 x i32>, <16 x i32> [[TMP21]], align 4			; AVX512-NEXT: [[WIDE_LOAD_2:%.]] = load <16 x i32>, <16 x i32> [[TMP21]], align 4
	; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_2]], zeroinitializer			; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_2]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[INDEX]], i64 [[INDEX_NEXT_1]]			; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT_1]]
	; AVX512-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <16 x i32>*			; AVX512-NEXT: [[TMP24:%.]] = bitcast i32 [[TMP23]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD_2:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> nonnull [[TMP24]], i32 4, <16 x i1> [[TMP22]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD_2:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP24]], i32 4, <16 x i1> [[TMP22]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP25:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_2]] to <16 x i64>			; AVX512-NEXT: [[TMP25:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_2]] to <16 x i64>
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP25]]			; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP25]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER_2:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP26]], i32 4, <16 x i1> [[TMP22]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER_2:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP26]], i32 4, <16 x i1> [[TMP22]], <16 x float> undef)
	; AVX512-NEXT: [[TMP27:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_2]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP27:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_2]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[INDEX_NEXT_1]]			; AVX512-NEXT: [[TMP28:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT_1]]
	; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*			; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP27]], <16 x float>* [[TMP29]], i32 4, <16 x i1> [[TMP22]])
	; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX7]], 48			; AVX512-NEXT: [[INDEX_NEXT_2:%.*]] = or i64 [[INDEX7]], 48
	; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_2]]			; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDEX_NEXT_2]]
	; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*			; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD_3:%.]] = load <16 x i32>, <16 x i32> [[TMP31]], align 4			; AVX512-NEXT: [[WIDE_LOAD_3:%.]] = load <16 x i32>, <16 x i32> [[TMP31]], align 4
	; AVX512-NEXT: [[TMP32:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_3]], zeroinitializer			; AVX512-NEXT: [[TMP32:%.*]] = icmp sgt <16 x i32> [[WIDE_LOAD_3]], zeroinitializer
	; AVX512-NEXT: [[TMP33:%.]] = getelementptr inbounds i32, i32 [[INDEX]], i64 [[INDEX_NEXT_2]]			; AVX512-NEXT: [[TMP33:%.]] = getelementptr i32, i32 [[INDEX]], i64 [[INDEX_NEXT_2]]
	; AVX512-NEXT: [[TMP34:%.]] = bitcast i32 [[TMP33]] to <16 x i32>*			; AVX512-NEXT: [[TMP34:%.]] = bitcast i32 [[TMP33]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD_3:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> nonnull [[TMP34]], i32 4, <16 x i1> [[TMP32]], <16 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD_3:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP34]], i32 4, <16 x i1> [[TMP32]], <16 x i32> poison)
	; AVX512-NEXT: [[TMP35:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_3]] to <16 x i64>			; AVX512-NEXT: [[TMP35:%.*]] = sext <16 x i32> [[WIDE_MASKED_LOAD_3]] to <16 x i64>
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP35]]			; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[IN]], <16 x i64> [[TMP35]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER_3:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP36]], i32 4, <16 x i1> [[TMP32]], <16 x float> undef)			; AVX512-NEXT: [[WIDE_MASKED_GATHER_3:%.]] = call <16 x float> @llvm.masked.gather.v16f32.v16p0f32(<16 x float> [[TMP36]], i32 4, <16 x i1> [[TMP32]], <16 x float> undef)
	; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_3]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>			; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_GATHER_3]], <float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01, float 5.000000e-01>
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds float, float [[OUT]], i64 [[INDEX_NEXT_2]]			; AVX512-NEXT: [[TMP38:%.]] = getelementptr float, float [[OUT]], i64 [[INDEX_NEXT_2]]
	; AVX512-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <16 x float>*			; AVX512-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]])			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP39]], i32 4, <16 x i1> [[TMP32]])
	; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX7]], 64			; AVX512-NEXT: [[INDEX_NEXT_3]] = add nuw nsw i64 [[INDEX7]], 64
	; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096			; AVX512-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT_3]], 4096
	; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; AVX512: for.end:			; AVX512: for.end:
	; AVX512-NEXT: ret void			; AVX512-NEXT: ret void
	;			;
	Show All 13 Lines
	; FVW2-NEXT: [[WIDE_LOAD9:%.]] = load <2 x i32>, <2 x i32> [[TMP5]], align 4			; FVW2-NEXT: [[WIDE_LOAD9:%.]] = load <2 x i32>, <2 x i32> [[TMP5]], align 4
	; FVW2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6			; FVW2-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
	; FVW2-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*			; FVW2-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_LOAD10:%.]] = load <2 x i32>, <2 x i32> [[TMP7]], align 4			; FVW2-NEXT: [[WIDE_LOAD10:%.]] = load <2 x i32>, <2 x i32> [[TMP7]], align 4
	; FVW2-NEXT: [[TMP8:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer			; FVW2-NEXT: [[TMP8:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD]], zeroinitializer
	; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer			; FVW2-NEXT: [[TMP9:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD8]], zeroinitializer
	; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer			; FVW2-NEXT: [[TMP10:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD9]], zeroinitializer
	; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer			; FVW2-NEXT: [[TMP11:%.*]] = icmp sgt <2 x i32> [[WIDE_LOAD10]], zeroinitializer
	; FVW2-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[INDEX:%.*]], i64 [[INDEX7]]			; FVW2-NEXT: [[TMP12:%.]] = getelementptr i32, i32 [[INDEX:%.*]], i64 [[INDEX7]]
	; FVW2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <2 x i32>*			; FVW2-NEXT: [[TMP13:%.]] = bitcast i32 [[TMP12]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP13]], i32 4, <2 x i1> [[TMP8]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP12]], i64 2			; FVW2-NEXT: [[TMP14:%.]] = getelementptr i32, i32 [[TMP12]], i64 2
	; FVW2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <2 x i32>*			; FVW2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> nonnull [[TMP15]], i32 4, <2 x i1> [[TMP9]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD11:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP15]], i32 4, <2 x i1> [[TMP9]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP12]], i64 4			; FVW2-NEXT: [[TMP16:%.]] = getelementptr i32, i32 [[TMP12]], i64 4
	; FVW2-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*			; FVW2-NEXT: [[TMP17:%.]] = bitcast i32 [[TMP16]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD12:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> nonnull [[TMP17]], i32 4, <2 x i1> [[TMP10]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD12:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP17]], i32 4, <2 x i1> [[TMP10]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP12]], i64 6			; FVW2-NEXT: [[TMP18:%.]] = getelementptr i32, i32 [[TMP12]], i64 6
	; FVW2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <2 x i32>*			; FVW2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <2 x i32>*
	; FVW2-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> nonnull [[TMP19]], i32 4, <2 x i1> [[TMP11]], <2 x i32> poison)			; FVW2-NEXT: [[WIDE_MASKED_LOAD13:%.]] = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32> [[TMP19]], i32 4, <2 x i1> [[TMP11]], <2 x i32> poison)
	; FVW2-NEXT: [[TMP20:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD]] to <2 x i64>			; FVW2-NEXT: [[TMP20:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD]] to <2 x i64>
	; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64>			; FVW2-NEXT: [[TMP21:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD11]] to <2 x i64>
	; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64>			; FVW2-NEXT: [[TMP22:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD12]] to <2 x i64>
	; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64>			; FVW2-NEXT: [[TMP23:%.*]] = sext <2 x i32> [[WIDE_MASKED_LOAD13]] to <2 x i64>
	; FVW2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[IN:%.*]], <2 x i64> [[TMP20]]			; FVW2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[IN:%.*]], <2 x i64> [[TMP20]]
	; FVW2-NEXT: [[TMP25:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP21]]			; FVW2-NEXT: [[TMP25:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP21]]
	; FVW2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP22]]			; FVW2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP22]]
	; FVW2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP23]]			; FVW2-NEXT: [[TMP27:%.]] = getelementptr inbounds float, float [[IN]], <2 x i64> [[TMP23]]
	; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP24]], i32 4, <2 x i1> [[TMP8]], <2 x float> undef)			; FVW2-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP24]], i32 4, <2 x i1> [[TMP8]], <2 x float> undef)
	; FVW2-NEXT: [[WIDE_MASKED_GATHER14:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP25]], i32 4, <2 x i1> [[TMP9]], <2 x float> undef)			; FVW2-NEXT: [[WIDE_MASKED_GATHER14:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP25]], i32 4, <2 x i1> [[TMP9]], <2 x float> undef)
	; FVW2-NEXT: [[WIDE_MASKED_GATHER15:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP26]], i32 4, <2 x i1> [[TMP10]], <2 x float> undef)			; FVW2-NEXT: [[WIDE_MASKED_GATHER15:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP26]], i32 4, <2 x i1> [[TMP10]], <2 x float> undef)
	; FVW2-NEXT: [[WIDE_MASKED_GATHER16:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP27]], i32 4, <2 x i1> [[TMP11]], <2 x float> undef)			; FVW2-NEXT: [[WIDE_MASKED_GATHER16:%.]] = call <2 x float> @llvm.masked.gather.v2f32.v2p0f32(<2 x float> [[TMP27]], i32 4, <2 x i1> [[TMP11]], <2 x float> undef)
	; FVW2-NEXT: [[TMP28:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP28:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER]], <float 5.000000e-01, float 5.000000e-01>
	; FVW2-NEXT: [[TMP29:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER14]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP29:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER14]], <float 5.000000e-01, float 5.000000e-01>
	; FVW2-NEXT: [[TMP30:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER15]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP30:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER15]], <float 5.000000e-01, float 5.000000e-01>
	; FVW2-NEXT: [[TMP31:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER16]], <float 5.000000e-01, float 5.000000e-01>			; FVW2-NEXT: [[TMP31:%.*]] = fadd <2 x float> [[WIDE_MASKED_GATHER16]], <float 5.000000e-01, float 5.000000e-01>
	; FVW2-NEXT: [[TMP32:%.]] = getelementptr inbounds float, float [[OUT:%.*]], i64 [[INDEX7]]			; FVW2-NEXT: [[TMP32:%.]] = getelementptr float, float [[OUT:%.*]], i64 [[INDEX7]]
	; FVW2-NEXT: [[TMP33:%.]] = bitcast float [[TMP32]] to <2 x float>*			; FVW2-NEXT: [[TMP33:%.]] = bitcast float [[TMP32]] to <2 x float>*
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP28]], <2 x float>* [[TMP33]], i32 4, <2 x i1> [[TMP8]])			; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP28]], <2 x float>* [[TMP33]], i32 4, <2 x i1> [[TMP8]])
	; FVW2-NEXT: [[TMP34:%.]] = getelementptr inbounds float, float [[TMP32]], i64 2			; FVW2-NEXT: [[TMP34:%.]] = getelementptr float, float [[TMP32]], i64 2
	; FVW2-NEXT: [[TMP35:%.]] = bitcast float [[TMP34]] to <2 x float>*			; FVW2-NEXT: [[TMP35:%.]] = bitcast float [[TMP34]] to <2 x float>*
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP29]], <2 x float>* [[TMP35]], i32 4, <2 x i1> [[TMP9]])			; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP29]], <2 x float>* [[TMP35]], i32 4, <2 x i1> [[TMP9]])
	; FVW2-NEXT: [[TMP36:%.]] = getelementptr inbounds float, float [[TMP32]], i64 4			; FVW2-NEXT: [[TMP36:%.]] = getelementptr float, float [[TMP32]], i64 4
	; FVW2-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <2 x float>*			; FVW2-NEXT: [[TMP37:%.]] = bitcast float [[TMP36]] to <2 x float>*
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP30]], <2 x float>* [[TMP37]], i32 4, <2 x i1> [[TMP10]])			; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP30]], <2 x float>* [[TMP37]], i32 4, <2 x i1> [[TMP10]])
	; FVW2-NEXT: [[TMP38:%.]] = getelementptr inbounds float, float [[TMP32]], i64 6			; FVW2-NEXT: [[TMP38:%.]] = getelementptr float, float [[TMP32]], i64 6
	; FVW2-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <2 x float>*			; FVW2-NEXT: [[TMP39:%.]] = bitcast float [[TMP38]] to <2 x float>*
	; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP31]], <2 x float>* [[TMP39]], i32 4, <2 x i1> [[TMP11]])			; FVW2-NEXT: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> [[TMP31]], <2 x float>* [[TMP39]], i32 4, <2 x i1> [[TMP11]])
	; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8			; FVW2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX7]], 8
	; FVW2-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; FVW2-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; FVW2-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; FVW2-NEXT: br i1 [[TMP40]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; FVW2: for.end:			; FVW2: for.end:
	; FVW2-NEXT: ret void			; FVW2-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 1,576 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll

	Show First 20 Lines • Show All 255 Lines • ▼ Show 20 Lines
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP1]], align 8, !alias.scope !17, !noalias !20			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <16 x i32>, <16 x i32> [[TMP1]], align 8, !alias.scope !17, !noalias !20
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <16 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <16 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP0]] to <16 x i32>*
	; CHECK-NEXT: store <16 x i32> [[BROADCAST_SPLAT19]], <16 x i32>* [[TMP3]], align 4, !alias.scope !17, !noalias !20			; CHECK-NEXT: store <16 x i32> [[BROADCAST_SPLAT19]], <16 x i32>* [[TMP3]], align 4, !alias.scope !17, !noalias !20
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP4:%.]] = getelementptr i32, i32 [[C]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <16 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[TMP4]] to <16 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP5]], i32 8, <16 x i1> [[TMP2]], <16 x i32> poison), !alias.scope !23			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP5]], i32 8, <16 x i1> [[TMP2]], <16 x i32> poison), !alias.scope !23
	; CHECK-NEXT: call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> [[WIDE_MASKED_LOAD]], <16 x i32*> [[BROADCAST_SPLAT21]], i32 4, <16 x i1> [[TMP2]]), !alias.scope !24, !noalias !23			; CHECK-NEXT: call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> [[WIDE_MASKED_LOAD]], <16 x i32*> [[BROADCAST_SPLAT21]], i32 4, <16 x i1> [[TMP2]]), !alias.scope !24, !noalias !23
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP6]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP25:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX16]], [[N_VEC]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[SMAX16]], [[N_VEC]]
	Show All 16 Lines
	; CHECK: vec.epilog.vector.body:			; CHECK: vec.epilog.vector.body:
	; CHECK-NEXT: [[INDEX25:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT26:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX25:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT26:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX25]]			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[INDEX25]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP7]] to <8 x i32>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i32 [[TMP7]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD28:%.]] = load <8 x i32>, <8 x i32> [[TMP8]], align 8			; CHECK-NEXT: [[WIDE_LOAD28:%.]] = load <8 x i32>, <8 x i32> [[TMP8]], align 8
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq <8 x i32> [[WIDE_LOAD28]], [[BROADCAST_SPLAT30]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq <8 x i32> [[WIDE_LOAD28]], [[BROADCAST_SPLAT30]]
	; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP7]] to <8 x i32>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast i32 [[TMP7]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[BROADCAST_SPLAT32]], <8 x i32>* [[TMP10]], align 4			; CHECK-NEXT: store <8 x i32> [[BROADCAST_SPLAT32]], <8 x i32>* [[TMP10]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX25]]			; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, i32 [[C]], i64 [[INDEX25]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <8 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[TMP11]] to <8 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD33:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP12]], i32 8, <8 x i1> [[TMP9]], <8 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD33:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP12]], i32 8, <8 x i1> [[TMP9]], <8 x i32> poison)
	; CHECK-NEXT: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> [[WIDE_MASKED_LOAD33]], <8 x i32*> [[BROADCAST_SPLAT35]], i32 4, <8 x i1> [[TMP9]])			; CHECK-NEXT: call void @llvm.masked.scatter.v8i32.v8p0i32(<8 x i32> [[WIDE_MASKED_LOAD33]], <8 x i32*> [[BROADCAST_SPLAT35]], i32 4, <8 x i1> [[TMP9]])
	; CHECK-NEXT: [[INDEX_NEXT26]] = add nuw i64 [[INDEX25]], 8			; CHECK-NEXT: [[INDEX_NEXT26]] = add nuw i64 [[INDEX25]], 8
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT26]], [[N_VEC24]]			; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT26]], [[N_VEC24]]
	; CHECK-NEXT: br i1 [[TMP13]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP13]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
	; CHECK: vec.epilog.middle.block:			; CHECK: vec.epilog.middle.block:
	; CHECK-NEXT: [[CMP_N27:%.*]] = icmp eq i64 [[SMAX22]], [[N_VEC24]]			; CHECK-NEXT: [[CMP_N27:%.*]] = icmp eq i64 [[SMAX22]], [[N_VEC24]]
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 12			; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[INDEX]], 12
	; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[INDEX]], 13			; CHECK-NEXT: [[TMP13:%.*]] = add i64 [[INDEX]], 13
	; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[INDEX]], 14			; CHECK-NEXT: [[TMP14:%.*]] = add i64 [[INDEX]], 14
	; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[INDEX]], 15			; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[INDEX]], 15
	; CHECK-NEXT: [[TMP16:%.*]] = icmp slt <4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp slt <4 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP17:%.*]] = icmp slt <4 x i64> [[STEP_ADD]], [[BROADCAST_SPLAT8]]			; CHECK-NEXT: [[TMP17:%.*]] = icmp slt <4 x i64> [[STEP_ADD]], [[BROADCAST_SPLAT8]]
	; CHECK-NEXT: [[TMP18:%.*]] = icmp slt <4 x i64> [[STEP_ADD1]], [[BROADCAST_SPLAT10]]			; CHECK-NEXT: [[TMP18:%.*]] = icmp slt <4 x i64> [[STEP_ADD1]], [[BROADCAST_SPLAT10]]
	; CHECK-NEXT: [[TMP19:%.*]] = icmp slt <4 x i64> [[STEP_ADD2]], [[BROADCAST_SPLAT12]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp slt <4 x i64> [[STEP_ADD2]], [[BROADCAST_SPLAT12]]
	; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 0			; CHECK-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0
	; CHECK-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <4 x i32>*			; CHECK-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP25]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP25]], align 4
	; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 4			; CHECK-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 4
	; CHECK-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <4 x i32>*			; CHECK-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD13:%.]] = load <4 x i32>, <4 x i32> [[TMP27]], align 4			; CHECK-NEXT: [[WIDE_LOAD13:%.]] = load <4 x i32>, <4 x i32> [[TMP27]], align 4
	; CHECK-NEXT: [[TMP28:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 8			; CHECK-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 8
	; CHECK-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <4 x i32>*			; CHECK-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP29]], align 4			; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP29]], align 4
	; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 12			; CHECK-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 12
	; CHECK-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <4 x i32>*			; CHECK-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x i32>, <4 x i32> [[TMP31]], align 4			; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x i32>, <4 x i32> [[TMP31]], align 4
	; CHECK-NEXT: [[TMP32:%.*]] = xor <4 x i1> [[TMP16]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP32:%.*]] = xor <4 x i1> [[TMP16]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP33:%.*]] = xor <4 x i1> [[TMP17]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP33:%.*]] = xor <4 x i1> [[TMP17]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP34:%.*]] = xor <4 x i1> [[TMP18]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP34:%.*]] = xor <4 x i1> [[TMP18]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP35:%.*]] = xor <4 x i1> [[TMP19]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP35:%.*]] = xor <4 x i1> [[TMP19]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP16]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP16]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI16:%.*]] = select <4 x i1> [[TMP17]], <4 x i32> [[WIDE_LOAD13]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI16:%.*]] = select <4 x i1> [[TMP17]], <4 x i32> [[WIDE_LOAD13]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP69]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP69]], align 4
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i32>, <4 x i32> [[TMP71]], align 4			; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i32>, <4 x i32> [[TMP71]], align 4
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x i32>, <4 x i32> [[TMP73]], align 4			; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x i32>, <4 x i32> [[TMP73]], align 4
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP75]], align 4			; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP75]], align 4
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 660 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.]] = load i1, i1 [[TMP32]], align 1			; CHECK-NEXT: [[TMP60:%.]] = load i1, i1 [[TMP32]], align 1
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> poison, i1 [[TMP57]], i32 0			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> poison, i1 [[TMP57]], i32 0
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 1			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 1
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 2			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 2
	; CHECK-NEXT: [[TMP64:%.*]] = insertelement <4 x i1> [[TMP63]], i1 [[TMP60]], i32 3			; CHECK-NEXT: [[TMP64:%.*]] = insertelement <4 x i1> [[TMP63]], i1 [[TMP60]], i32 3
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP69:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 0			; CHECK-NEXT: [[TMP69:%.]] = getelementptr i32, i32 [[TMP65]], i32 0
	; CHECK-NEXT: [[TMP70:%.]] = bitcast i32 [[TMP69]] to <4 x i32>*			; CHECK-NEXT: [[TMP70:%.]] = bitcast i32 [[TMP69]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP70]], i32 4, <4 x i1> [[TMP40]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP70]], i32 4, <4 x i1> [[TMP40]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP71:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 4			; CHECK-NEXT: [[TMP71:%.]] = getelementptr i32, i32 [[TMP65]], i32 4
	; CHECK-NEXT: [[TMP72:%.]] = bitcast i32 [[TMP71]] to <4 x i32>*			; CHECK-NEXT: [[TMP72:%.]] = bitcast i32 [[TMP71]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP72]], i32 4, <4 x i1> [[TMP48]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP72]], i32 4, <4 x i1> [[TMP48]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP73:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 8			; CHECK-NEXT: [[TMP73:%.]] = getelementptr i32, i32 [[TMP65]], i32 8
	; CHECK-NEXT: [[TMP74:%.]] = bitcast i32 [[TMP73]] to <4 x i32>*			; CHECK-NEXT: [[TMP74:%.]] = bitcast i32 [[TMP73]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP74]], i32 4, <4 x i1> [[TMP56]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP74]], i32 4, <4 x i1> [[TMP56]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP75:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 12			; CHECK-NEXT: [[TMP75:%.]] = getelementptr i32, i32 [[TMP65]], i32 12
	; CHECK-NEXT: [[TMP76:%.]] = bitcast i32 [[TMP75]] to <4 x i32>*			; CHECK-NEXT: [[TMP76:%.]] = bitcast i32 [[TMP75]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP76]], i32 4, <4 x i1> [[TMP64]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP76]], i32 4, <4 x i1> [[TMP64]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP48]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP48]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP56]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP56]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP80:%.*]] = xor <4 x i1> [[TMP64]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP80:%.*]] = xor <4 x i1> [[TMP64]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP40]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP40]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP48]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP48]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 488 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.]] = load i1, i1 [[TMP32]], align 1			; CHECK-NEXT: [[TMP60:%.]] = load i1, i1 [[TMP32]], align 1
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> poison, i1 [[TMP57]], i32 0			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> poison, i1 [[TMP57]], i32 0
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 1			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 1
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 2			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 2
	; CHECK-NEXT: [[TMP64:%.*]] = insertelement <4 x i1> [[TMP63]], i1 [[TMP60]], i32 3			; CHECK-NEXT: [[TMP64:%.*]] = insertelement <4 x i1> [[TMP63]], i1 [[TMP60]], i32 3
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP1]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP5]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP5]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP9]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP13]]			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP13]]
	; CHECK-NEXT: [[TMP69:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 0			; CHECK-NEXT: [[TMP69:%.]] = getelementptr i32, i32 [[TMP65]], i32 0
	; CHECK-NEXT: [[TMP70:%.]] = bitcast i32 [[TMP69]] to <4 x i32>*			; CHECK-NEXT: [[TMP70:%.]] = bitcast i32 [[TMP69]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP70]], align 4			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP70]], align 4
	; CHECK-NEXT: [[TMP71:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 4			; CHECK-NEXT: [[TMP71:%.]] = getelementptr i32, i32 [[TMP65]], i32 4
	; CHECK-NEXT: [[TMP72:%.]] = bitcast i32 [[TMP71]] to <4 x i32>*			; CHECK-NEXT: [[TMP72:%.]] = bitcast i32 [[TMP71]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i32>, <4 x i32> [[TMP72]], align 4			; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i32>, <4 x i32> [[TMP72]], align 4
	; CHECK-NEXT: [[TMP73:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 8			; CHECK-NEXT: [[TMP73:%.]] = getelementptr i32, i32 [[TMP65]], i32 8
	; CHECK-NEXT: [[TMP74:%.]] = bitcast i32 [[TMP73]] to <4 x i32>*			; CHECK-NEXT: [[TMP74:%.]] = bitcast i32 [[TMP73]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x i32>, <4 x i32> [[TMP74]], align 4			; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x i32>, <4 x i32> [[TMP74]], align 4
	; CHECK-NEXT: [[TMP75:%.]] = getelementptr inbounds i32, i32 [[TMP65]], i32 12			; CHECK-NEXT: [[TMP75:%.]] = getelementptr i32, i32 [[TMP65]], i32 12
	; CHECK-NEXT: [[TMP76:%.]] = bitcast i32 [[TMP75]] to <4 x i32>*			; CHECK-NEXT: [[TMP76:%.]] = bitcast i32 [[TMP75]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP76]], align 4			; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP76]], align 4
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP48]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP48]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP56]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP56]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP80:%.*]] = xor <4 x i1> [[TMP64]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP80:%.*]] = xor <4 x i1> [[TMP64]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP40]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP40]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP48]], <4 x i32> [[WIDE_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP48]], <4 x i32> [[WIDE_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1			; CHECK-NEXT: [[TMP56:%.]] = load i1, i1 [[TMP28]], align 1
	; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1			; CHECK-NEXT: [[TMP57:%.]] = load i1, i1 [[TMP29]], align 1
	; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1			; CHECK-NEXT: [[TMP58:%.]] = load i1, i1 [[TMP30]], align 1
	; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1			; CHECK-NEXT: [[TMP59:%.]] = load i1, i1 [[TMP31]], align 1
	; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0			; CHECK-NEXT: [[TMP60:%.*]] = insertelement <4 x i1> poison, i1 [[TMP56]], i32 0
	; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1			; CHECK-NEXT: [[TMP61:%.*]] = insertelement <4 x i1> [[TMP60]], i1 [[TMP57]], i32 1
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; AVX1: vector.body:			; AVX1: vector.body:
	; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*			; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*
	; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4, !alias.scope !0			; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4, !alias.scope !0
	; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX1-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP5:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[TMP5]], i32 0			; AVX1-NEXT: [[TMP6:%.]] = getelementptr i32, i32 [[TMP5]], i32 0
	; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*			; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <8 x i32>*
	; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison), !alias.scope !3			; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison), !alias.scope !3
	; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]			; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
	; AVX1-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP9:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP9]], i32 0			; AVX1-NEXT: [[TMP10:%.]] = getelementptr i32, i32 [[TMP9]], i32 0
	; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*			; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP10]] to <8 x i32>*
	; AVX1-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP8]], <8 x i32>* [[TMP11]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !5, !noalias !7			; AVX1-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP8]], <8 x i32>* [[TMP11]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !5, !noalias !7
	; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; AVX1: middle.block:			; AVX1: middle.block:
	; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000			; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
	; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !0			; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !0
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24
	; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*			; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*
	; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !0			; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !0
	; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 0			; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0
	; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <8 x i32>*			; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <8 x i32>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison), !alias.scope !3			; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison), !alias.scope !3
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 8			; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 8
	; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <8 x i32>*			; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <8 x i32>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison), !alias.scope !3			; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison), !alias.scope !3
	; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 16			; AVX2-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 16
	; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <8 x i32>*			; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <8 x i32>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison), !alias.scope !3			; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison), !alias.scope !3
	; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 24			; AVX2-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 24
	; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <8 x i32>*			; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <8 x i32>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison), !alias.scope !3			; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison), !alias.scope !3
	; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]			; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
	; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]			; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
	; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]			; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
	; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]			; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
	; AVX2-NEXT: [[TMP36:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP37:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP37:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP38:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP39:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP39:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 0			; AVX2-NEXT: [[TMP40:%.]] = getelementptr i32, i32 [[TMP36]], i32 0
	; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <8 x i32>*			; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <8 x i32>*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP32]], <8 x i32>* [[TMP41]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !5, !noalias !7			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP32]], <8 x i32>* [[TMP41]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !5, !noalias !7
	; AVX2-NEXT: [[TMP42:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 8			; AVX2-NEXT: [[TMP42:%.]] = getelementptr i32, i32 [[TMP36]], i32 8
	; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <8 x i32>*			; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <8 x i32>*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP33]], <8 x i32>* [[TMP43]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !5, !noalias !7			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP33]], <8 x i32>* [[TMP43]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !5, !noalias !7
	; AVX2-NEXT: [[TMP44:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 16			; AVX2-NEXT: [[TMP44:%.]] = getelementptr i32, i32 [[TMP36]], i32 16
	; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <8 x i32>*			; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <8 x i32>*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP34]], <8 x i32>* [[TMP45]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !5, !noalias !7			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP34]], <8 x i32>* [[TMP45]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !5, !noalias !7
	; AVX2-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 24			; AVX2-NEXT: [[TMP46:%.]] = getelementptr i32, i32 [[TMP36]], i32 24
	; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <8 x i32>*			; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <8 x i32>*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP35]], <8 x i32>* [[TMP47]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !5, !noalias !7			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP35]], <8 x i32>* [[TMP47]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !5, !noalias !7
	; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; AVX2: middle.block:			; AVX2: middle.block:
	; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4, !alias.scope !0			; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4, !alias.scope !0
	; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48			; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48
	; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4, !alias.scope !0			; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4, !alias.scope !0
	; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP20:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP21:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP22:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 0			; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32, i32 [[TMP20]], i32 0
	; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <16 x i32>*			; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 [[TMP24]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison), !alias.scope !3			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison), !alias.scope !3
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 16			; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32, i32 [[TMP20]], i32 16
	; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <16 x i32>*			; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 [[TMP26]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison), !alias.scope !3			; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison), !alias.scope !3
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 32			; AVX512-NEXT: [[TMP28:%.]] = getelementptr i32, i32 [[TMP20]], i32 32
	; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <16 x i32>*			; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 [[TMP28]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison), !alias.scope !3			; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison), !alias.scope !3
	; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP20]], i32 48			; AVX512-NEXT: [[TMP30:%.]] = getelementptr i32, i32 [[TMP20]], i32 48
	; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*			; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 [[TMP30]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison), !alias.scope !3			; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison), !alias.scope !3
	; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]			; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
	; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]			; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
	; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]			; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
	; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]			; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP37:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP37:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP39:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP39:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 0			; AVX512-NEXT: [[TMP40:%.]] = getelementptr i32, i32 [[TMP36]], i32 0
	; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <16 x i32>*			; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 [[TMP40]] to <16 x i32>*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP32]], <16 x i32>* [[TMP41]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !5, !noalias !7			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP32]], <16 x i32>* [[TMP41]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !5, !noalias !7
	; AVX512-NEXT: [[TMP42:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 16			; AVX512-NEXT: [[TMP42:%.]] = getelementptr i32, i32 [[TMP36]], i32 16
	; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <16 x i32>*			; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 [[TMP42]] to <16 x i32>*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP33]], <16 x i32>* [[TMP43]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !5, !noalias !7			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP33]], <16 x i32>* [[TMP43]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !5, !noalias !7
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 32			; AVX512-NEXT: [[TMP44:%.]] = getelementptr i32, i32 [[TMP36]], i32 32
	; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <16 x i32>*			; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 [[TMP44]] to <16 x i32>*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP34]], <16 x i32>* [[TMP45]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !5, !noalias !7			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP34]], <16 x i32>* [[TMP45]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !5, !noalias !7
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 [[TMP36]], i32 48			; AVX512-NEXT: [[TMP46:%.]] = getelementptr i32, i32 [[TMP36]], i32 48
	; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <16 x i32>*			; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 [[TMP46]] to <16 x i32>*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP35]], <16 x i32>* [[TMP47]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !5, !noalias !7			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> [[TMP35]], <16 x i32>* [[TMP47]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !5, !noalias !7
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64
	; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; AVX512: vec.epilog.iter.check:			; AVX512: vec.epilog.iter.check:
	; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; AVX512: vec.epilog.ph:			; AVX512: vec.epilog.ph:
	; AVX512-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ 9984, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; AVX512-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ 9984, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; AVX512-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; AVX512-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; AVX512: vec.epilog.vector.body:			; AVX512: vec.epilog.vector.body:
	; AVX512-NEXT: [[INDEX18:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT19:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; AVX512-NEXT: [[INDEX18:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT19:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; AVX512-NEXT: [[TMP49:%.*]] = add i64 [[INDEX18]], 0			; AVX512-NEXT: [[TMP49:%.*]] = add i64 [[INDEX18]], 0
	; AVX512-NEXT: [[TMP50:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP49]]			; AVX512-NEXT: [[TMP50:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP49]]
	; AVX512-NEXT: [[TMP51:%.]] = getelementptr inbounds i32, i32 [[TMP50]], i32 0			; AVX512-NEXT: [[TMP51:%.]] = getelementptr inbounds i32, i32 [[TMP50]], i32 0
	; AVX512-NEXT: [[TMP52:%.]] = bitcast i32 [[TMP51]] to <8 x i32>*			; AVX512-NEXT: [[TMP52:%.]] = bitcast i32 [[TMP51]] to <8 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD21:%.]] = load <8 x i32>, <8 x i32> [[TMP52]], align 4			; AVX512-NEXT: [[WIDE_LOAD21:%.]] = load <8 x i32>, <8 x i32> [[TMP52]], align 4
	; AVX512-NEXT: [[TMP53:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD21]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP53:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD21]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP54:%.]] = getelementptr inbounds i32, i32 [[B]], i64 [[TMP49]]			; AVX512-NEXT: [[TMP54:%.]] = getelementptr i32, i32 [[B]], i64 [[TMP49]]
	; AVX512-NEXT: [[TMP55:%.]] = getelementptr inbounds i32, i32 [[TMP54]], i32 0			; AVX512-NEXT: [[TMP55:%.]] = getelementptr i32, i32 [[TMP54]], i32 0
	; AVX512-NEXT: [[TMP56:%.]] = bitcast i32 [[TMP55]] to <8 x i32>*			; AVX512-NEXT: [[TMP56:%.]] = bitcast i32 [[TMP55]] to <8 x i32>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD22:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP56]], i32 4, <8 x i1> [[TMP53]], <8 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD22:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32> [[TMP56]], i32 4, <8 x i1> [[TMP53]], <8 x i32> poison)
	; AVX512-NEXT: [[TMP57:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD22]], [[WIDE_LOAD21]]			; AVX512-NEXT: [[TMP57:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD22]], [[WIDE_LOAD21]]
	; AVX512-NEXT: [[TMP58:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP49]]			; AVX512-NEXT: [[TMP58:%.]] = getelementptr i32, i32 [[A]], i64 [[TMP49]]
	; AVX512-NEXT: [[TMP59:%.]] = getelementptr inbounds i32, i32 [[TMP58]], i32 0			; AVX512-NEXT: [[TMP59:%.]] = getelementptr i32, i32 [[TMP58]], i32 0
	; AVX512-NEXT: [[TMP60:%.]] = bitcast i32 [[TMP59]] to <8 x i32>*			; AVX512-NEXT: [[TMP60:%.]] = bitcast i32 [[TMP59]] to <8 x i32>*
	; AVX512-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP57]], <8 x i32>* [[TMP60]], i32 4, <8 x i1> [[TMP53]])			; AVX512-NEXT: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> [[TMP57]], <8 x i32>* [[TMP60]], i32 4, <8 x i1> [[TMP53]])
	; AVX512-NEXT: [[INDEX_NEXT19]] = add nuw i64 [[INDEX18]], 8			; AVX512-NEXT: [[INDEX_NEXT19]] = add nuw i64 [[INDEX18]], 8
	; AVX512-NEXT: [[TMP61:%.*]] = icmp eq i64 [[INDEX_NEXT19]], 10000			; AVX512-NEXT: [[TMP61:%.*]] = icmp eq i64 [[INDEX_NEXT19]], 10000
	; AVX512-NEXT: br i1 [[TMP61]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP61]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; AVX512: vec.epilog.middle.block:			; AVX512: vec.epilog.middle.block:
	; AVX512-NEXT: [[CMP_N20:%.*]] = icmp eq i64 10000, 10000			; AVX512-NEXT: [[CMP_N20:%.*]] = icmp eq i64 10000, 10000
	; AVX512-NEXT: br i1 [[CMP_N20]], label [[FOR_END_LOOPEXIT:%.*]], label [[VEC_EPILOG_SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N20]], label [[FOR_END_LOOPEXIT:%.*]], label [[VEC_EPILOG_SCALAR_PH]]
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; AVX1: vector.body:			; AVX1: vector.body:
	; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP1]], i32 0			; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP1]], i32 0
	; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 addrspace(1) [[TMP2]] to <8 x i32> addrspace(1)*			; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 addrspace(1) [[TMP2]] to <8 x i32> addrspace(1)*
	; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP3]], align 4, !alias.scope !11			; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP3]], align 4, !alias.scope !11
	; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX1-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP5:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP5]], i32 0			; AVX1-NEXT: [[TMP6:%.]] = getelementptr i32, i32 addrspace(1) [[TMP5]], i32 0
	; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 addrspace(1) [[TMP6]] to <8 x i32> addrspace(1)*			; AVX1-NEXT: [[TMP7:%.]] = bitcast i32 addrspace(1) [[TMP6]] to <8 x i32> addrspace(1)*
	; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison), !alias.scope !14			; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x i32> poison), !alias.scope !14
	; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]			; AVX1-NEXT: [[TMP8:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
	; AVX1-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP9:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP9]], i32 0			; AVX1-NEXT: [[TMP10:%.]] = getelementptr i32, i32 addrspace(1) [[TMP9]], i32 0
	; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <8 x i32> addrspace(1)*			; AVX1-NEXT: [[TMP11:%.]] = bitcast i32 addrspace(1) [[TMP10]] to <8 x i32> addrspace(1)*
	; AVX1-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP8]], <8 x i32> addrspace(1)* [[TMP11]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !16, !noalias !18			; AVX1-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP8]], <8 x i32> addrspace(1)* [[TMP11]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !16, !noalias !18
	; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; AVX1-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]			; AVX1-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; AVX1: middle.block:			; AVX1: middle.block:
	; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000			; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
	; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP13]], align 4, !alias.scope !11			; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP13]], align 4, !alias.scope !11
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 24			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 24
	; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP15]], align 4, !alias.scope !11			; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP15]], align 4, !alias.scope !11
	; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP20:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP21:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP22:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP23:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 0			; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 0
	; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison), !alias.scope !14			; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x i32> poison), !alias.scope !14
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 8			; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 8
	; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison), !alias.scope !14			; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x i32> poison), !alias.scope !14
	; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 16			; AVX2-NEXT: [[TMP28:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 16
	; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison), !alias.scope !14			; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x i32> poison), !alias.scope !14
	; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 24			; AVX2-NEXT: [[TMP30:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 24
	; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison), !alias.scope !14			; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x i32> poison), !alias.scope !14
	; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]			; AVX2-NEXT: [[TMP32:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
	; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]			; AVX2-NEXT: [[TMP33:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
	; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]			; AVX2-NEXT: [[TMP34:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
	; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]			; AVX2-NEXT: [[TMP35:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
	; AVX2-NEXT: [[TMP36:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP37:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP37:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP38:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP39:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP39:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 0			; AVX2-NEXT: [[TMP40:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 0
	; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP32]], <8 x i32> addrspace(1)* [[TMP41]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !16, !noalias !18			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP32]], <8 x i32> addrspace(1)* [[TMP41]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !16, !noalias !18
	; AVX2-NEXT: [[TMP42:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 8			; AVX2-NEXT: [[TMP42:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 8
	; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP33]], <8 x i32> addrspace(1)* [[TMP43]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !16, !noalias !18			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP33]], <8 x i32> addrspace(1)* [[TMP43]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !16, !noalias !18
	; AVX2-NEXT: [[TMP44:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 16			; AVX2-NEXT: [[TMP44:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 16
	; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP34]], <8 x i32> addrspace(1)* [[TMP45]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !16, !noalias !18			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP34]], <8 x i32> addrspace(1)* [[TMP45]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !16, !noalias !18
	; AVX2-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 24			; AVX2-NEXT: [[TMP46:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 24
	; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <8 x i32> addrspace(1)*			; AVX2-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <8 x i32> addrspace(1)*
	; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP35]], <8 x i32> addrspace(1)* [[TMP47]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !16, !noalias !18			; AVX2-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP35]], <8 x i32> addrspace(1)* [[TMP47]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !16, !noalias !18
	; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX2-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]			; AVX2-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
	; AVX2: middle.block:			; AVX2: middle.block:
	; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP13]], align 4, !alias.scope !13			; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP13]], align 4, !alias.scope !13
	; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 48			; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP4]], i32 48
	; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 addrspace(1) [[TMP14]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP15]], align 4, !alias.scope !13			; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> addrspace(1) [[TMP15]], align 4, !alias.scope !13
	; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP20:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP21:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP22:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP23:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP23:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 0			; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 0
	; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP25:%.]] = bitcast i32 addrspace(1) [[TMP24]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison), !alias.scope !16			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x i32> poison), !alias.scope !16
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 16			; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 16
	; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP27:%.]] = bitcast i32 addrspace(1) [[TMP26]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison), !alias.scope !16			; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x i32> poison), !alias.scope !16
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 32			; AVX512-NEXT: [[TMP28:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 32
	; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP29:%.]] = bitcast i32 addrspace(1) [[TMP28]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison), !alias.scope !16			; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x i32> poison), !alias.scope !16
	; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP20]], i32 48			; AVX512-NEXT: [[TMP30:%.]] = getelementptr i32, i32 addrspace(1) [[TMP20]], i32 48
	; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP31:%.]] = bitcast i32 addrspace(1) [[TMP30]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison), !alias.scope !16			; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x i32> @llvm.masked.load.v16i32.p1v16i32(<16 x i32> addrspace(1) [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x i32> poison), !alias.scope !16
	; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]			; AVX512-NEXT: [[TMP32:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD]], [[WIDE_LOAD]]
	; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]			; AVX512-NEXT: [[TMP33:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD15]], [[WIDE_LOAD12]]
	; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]			; AVX512-NEXT: [[TMP34:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD16]], [[WIDE_LOAD13]]
	; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]			; AVX512-NEXT: [[TMP35:%.*]] = add nsw <16 x i32> [[WIDE_MASKED_LOAD17]], [[WIDE_LOAD14]]
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP37:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP37:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP39:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP39:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP40:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 0			; AVX512-NEXT: [[TMP40:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 0
	; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP41:%.]] = bitcast i32 addrspace(1) [[TMP40]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP32]], <16 x i32> addrspace(1)* [[TMP41]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !18, !noalias !20			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP32]], <16 x i32> addrspace(1)* [[TMP41]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !18, !noalias !20
	; AVX512-NEXT: [[TMP42:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 16			; AVX512-NEXT: [[TMP42:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 16
	; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP43:%.]] = bitcast i32 addrspace(1) [[TMP42]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP33]], <16 x i32> addrspace(1)* [[TMP43]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !18, !noalias !20			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP33]], <16 x i32> addrspace(1)* [[TMP43]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !18, !noalias !20
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 32			; AVX512-NEXT: [[TMP44:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 32
	; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP45:%.]] = bitcast i32 addrspace(1) [[TMP44]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP34]], <16 x i32> addrspace(1)* [[TMP45]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !18, !noalias !20			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP34]], <16 x i32> addrspace(1)* [[TMP45]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !18, !noalias !20
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP36]], i32 48			; AVX512-NEXT: [[TMP46:%.]] = getelementptr i32, i32 addrspace(1) [[TMP36]], i32 48
	; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <16 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP47:%.]] = bitcast i32 addrspace(1) [[TMP46]] to <16 x i32> addrspace(1)*
	; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP35]], <16 x i32> addrspace(1)* [[TMP47]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !18, !noalias !20			; AVX512-NEXT: call void @llvm.masked.store.v16i32.p1v16i32(<16 x i32> [[TMP35]], <16 x i32> addrspace(1)* [[TMP47]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !18, !noalias !20
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64
	; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX512-NEXT: [[TMP48:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP48]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; AVX512: vec.epilog.iter.check:			; AVX512: vec.epilog.iter.check:
	; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; AVX512: vec.epilog.ph:			; AVX512: vec.epilog.ph:
	; AVX512-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ 9984, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; AVX512-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ 9984, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; AVX512-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; AVX512-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; AVX512: vec.epilog.vector.body:			; AVX512: vec.epilog.vector.body:
	; AVX512-NEXT: [[INDEX18:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT19:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; AVX512-NEXT: [[INDEX18:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT19:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; AVX512-NEXT: [[TMP49:%.*]] = add i64 [[INDEX18]], 0			; AVX512-NEXT: [[TMP49:%.*]] = add i64 [[INDEX18]], 0
	; AVX512-NEXT: [[TMP50:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP49]]			; AVX512-NEXT: [[TMP50:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TRIGGER]], i64 [[TMP49]]
	; AVX512-NEXT: [[TMP51:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP50]], i32 0			; AVX512-NEXT: [[TMP51:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP50]], i32 0
	; AVX512-NEXT: [[TMP52:%.]] = bitcast i32 addrspace(1) [[TMP51]] to <8 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP52:%.]] = bitcast i32 addrspace(1) [[TMP51]] to <8 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_LOAD21:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP52]], align 4			; AVX512-NEXT: [[WIDE_LOAD21:%.]] = load <8 x i32>, <8 x i32> addrspace(1) [[TMP52]], align 4
	; AVX512-NEXT: [[TMP53:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD21]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP53:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD21]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP54:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[B]], i64 [[TMP49]]			; AVX512-NEXT: [[TMP54:%.]] = getelementptr i32, i32 addrspace(1) [[B]], i64 [[TMP49]]
	; AVX512-NEXT: [[TMP55:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP54]], i32 0			; AVX512-NEXT: [[TMP55:%.]] = getelementptr i32, i32 addrspace(1) [[TMP54]], i32 0
	; AVX512-NEXT: [[TMP56:%.]] = bitcast i32 addrspace(1) [[TMP55]] to <8 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP56:%.]] = bitcast i32 addrspace(1) [[TMP55]] to <8 x i32> addrspace(1)*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD22:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP56]], i32 4, <8 x i1> [[TMP53]], <8 x i32> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD22:%.]] = call <8 x i32> @llvm.masked.load.v8i32.p1v8i32(<8 x i32> addrspace(1) [[TMP56]], i32 4, <8 x i1> [[TMP53]], <8 x i32> poison)
	; AVX512-NEXT: [[TMP57:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD22]], [[WIDE_LOAD21]]			; AVX512-NEXT: [[TMP57:%.*]] = add nsw <8 x i32> [[WIDE_MASKED_LOAD22]], [[WIDE_LOAD21]]
	; AVX512-NEXT: [[TMP58:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[A]], i64 [[TMP49]]			; AVX512-NEXT: [[TMP58:%.]] = getelementptr i32, i32 addrspace(1) [[A]], i64 [[TMP49]]
	; AVX512-NEXT: [[TMP59:%.]] = getelementptr inbounds i32, i32 addrspace(1) [[TMP58]], i32 0			; AVX512-NEXT: [[TMP59:%.]] = getelementptr i32, i32 addrspace(1) [[TMP58]], i32 0
	; AVX512-NEXT: [[TMP60:%.]] = bitcast i32 addrspace(1) [[TMP59]] to <8 x i32> addrspace(1)*			; AVX512-NEXT: [[TMP60:%.]] = bitcast i32 addrspace(1) [[TMP59]] to <8 x i32> addrspace(1)*
	; AVX512-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP57]], <8 x i32> addrspace(1)* [[TMP60]], i32 4, <8 x i1> [[TMP53]])			; AVX512-NEXT: call void @llvm.masked.store.v8i32.p1v8i32(<8 x i32> [[TMP57]], <8 x i32> addrspace(1)* [[TMP60]], i32 4, <8 x i1> [[TMP53]])
	; AVX512-NEXT: [[INDEX_NEXT19]] = add nuw i64 [[INDEX18]], 8			; AVX512-NEXT: [[INDEX_NEXT19]] = add nuw i64 [[INDEX18]], 8
	; AVX512-NEXT: [[TMP61:%.*]] = icmp eq i64 [[INDEX_NEXT19]], 10000			; AVX512-NEXT: [[TMP61:%.*]] = icmp eq i64 [[INDEX_NEXT19]], 10000
	; AVX512-NEXT: br i1 [[TMP61]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP61]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; AVX512: vec.epilog.middle.block:			; AVX512: vec.epilog.middle.block:
	; AVX512-NEXT: [[CMP_N20:%.*]] = icmp eq i64 10000, 10000			; AVX512-NEXT: [[CMP_N20:%.*]] = icmp eq i64 10000, 10000
	; AVX512-NEXT: br i1 [[CMP_N20]], label [[FOR_END_LOOPEXIT:%.*]], label [[VEC_EPILOG_SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N20]], label [[FOR_END_LOOPEXIT:%.*]], label [[VEC_EPILOG_SCALAR_PH]]
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; AVX1: vector.body:			; AVX1: vector.body:
	; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; AVX1-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; AVX1-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0			; AVX1-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i32 0
	; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*			; AVX1-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <8 x i32>*
	; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4, !alias.scope !21			; AVX1-NEXT: [[WIDE_LOAD:%.]] = load <8 x i32>, <8 x i32> [[TMP3]], align 4, !alias.scope !21
	; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX1-NEXT: [[TMP4:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX1-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP5:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP6:%.]] = getelementptr inbounds float, float [[TMP5]], i32 0			; AVX1-NEXT: [[TMP6:%.]] = getelementptr float, float [[TMP5]], i32 0
	; AVX1-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <8 x float>*			; AVX1-NEXT: [[TMP7:%.]] = bitcast float [[TMP6]] to <8 x float>*
	; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x float> poison), !alias.scope !24			; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP7]], i32 4, <8 x i1> [[TMP4]], <8 x float> poison), !alias.scope !24
	; AVX1-NEXT: [[TMP8:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>			; AVX1-NEXT: [[TMP8:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>
	; AVX1-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP8]]			; AVX1-NEXT: [[TMP9:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP8]]
	; AVX1-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP10:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP10]], i32 0			; AVX1-NEXT: [[TMP11:%.]] = getelementptr float, float [[TMP10]], i32 0
	; AVX1-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <8 x float>*			; AVX1-NEXT: [[TMP12:%.]] = bitcast float [[TMP11]] to <8 x float>*
	; AVX1-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP9]], <8 x float>* [[TMP12]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !26, !noalias !28			; AVX1-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP9]], <8 x float>* [[TMP12]], i32 4, <8 x i1> [[TMP4]]), !alias.scope !26, !noalias !28
	; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8			; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
	; AVX1-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; AVX1-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	; AVX1-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]			; AVX1-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]
	; AVX1: middle.block:			; AVX1: middle.block:
	; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000			; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
	; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !21			; AVX2-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !21
	; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24			; AVX2-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24
	; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*			; AVX2-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*
	; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !21			; AVX2-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !21
	; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX2-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX2-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP20:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP21:%.]] = getelementptr float, float [[B]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP22:%.]] = getelementptr float, float [[B]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP23:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP23:%.]] = getelementptr float, float [[B]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP20]], i32 0			; AVX2-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP20]], i32 0
	; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <8 x float>*			; AVX2-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <8 x float>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x float> poison), !alias.scope !24			; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP25]], i32 4, <8 x i1> [[TMP16]], <8 x float> poison), !alias.scope !24
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP20]], i32 8			; AVX2-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP20]], i32 8
	; AVX2-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <8 x float>*			; AVX2-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <8 x float>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x float> poison), !alias.scope !24			; AVX2-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP27]], i32 4, <8 x i1> [[TMP17]], <8 x float> poison), !alias.scope !24
	; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP20]], i32 16			; AVX2-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP20]], i32 16
	; AVX2-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <8 x float>*			; AVX2-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <8 x float>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x float> poison), !alias.scope !24			; AVX2-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP29]], i32 4, <8 x i1> [[TMP18]], <8 x float> poison), !alias.scope !24
	; AVX2-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP20]], i32 24			; AVX2-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP20]], i32 24
	; AVX2-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <8 x float>*			; AVX2-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <8 x float>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x float> poison), !alias.scope !24			; AVX2-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP31]], i32 4, <8 x i1> [[TMP19]], <8 x float> poison), !alias.scope !24
	; AVX2-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>			; AVX2-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x float>
	; AVX2-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x float>			; AVX2-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x float>
	; AVX2-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x float>			; AVX2-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x float>
	; AVX2-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x float>			; AVX2-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x float>
	; AVX2-NEXT: [[TMP36:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]			; AVX2-NEXT: [[TMP36:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]
	; AVX2-NEXT: [[TMP37:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]			; AVX2-NEXT: [[TMP37:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]
	; AVX2-NEXT: [[TMP38:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]			; AVX2-NEXT: [[TMP38:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]
	; AVX2-NEXT: [[TMP39:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]			; AVX2-NEXT: [[TMP39:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]
	; AVX2-NEXT: [[TMP40:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP40:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP41:%.]] = getelementptr float, float [[A]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP42:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP42:%.]] = getelementptr float, float [[A]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP43:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP43:%.]] = getelementptr float, float [[A]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP44:%.]] = getelementptr inbounds float, float [[TMP40]], i32 0			; AVX2-NEXT: [[TMP44:%.]] = getelementptr float, float [[TMP40]], i32 0
	; AVX2-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <8 x float>*			; AVX2-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <8 x float>*
	; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP36]], <8 x float>* [[TMP45]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !26, !noalias !28			; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP36]], <8 x float>* [[TMP45]], i32 4, <8 x i1> [[TMP16]]), !alias.scope !26, !noalias !28
	; AVX2-NEXT: [[TMP46:%.]] = getelementptr inbounds float, float [[TMP40]], i32 8			; AVX2-NEXT: [[TMP46:%.]] = getelementptr float, float [[TMP40]], i32 8
	; AVX2-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <8 x float>*			; AVX2-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <8 x float>*
	; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP37]], <8 x float>* [[TMP47]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !26, !noalias !28			; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP37]], <8 x float>* [[TMP47]], i32 4, <8 x i1> [[TMP17]]), !alias.scope !26, !noalias !28
	; AVX2-NEXT: [[TMP48:%.]] = getelementptr inbounds float, float [[TMP40]], i32 16			; AVX2-NEXT: [[TMP48:%.]] = getelementptr float, float [[TMP40]], i32 16
	; AVX2-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <8 x float>*			; AVX2-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <8 x float>*
	; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP38]], <8 x float>* [[TMP49]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !26, !noalias !28			; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP38]], <8 x float>* [[TMP49]], i32 4, <8 x i1> [[TMP18]]), !alias.scope !26, !noalias !28
	; AVX2-NEXT: [[TMP50:%.]] = getelementptr inbounds float, float [[TMP40]], i32 24			; AVX2-NEXT: [[TMP50:%.]] = getelementptr float, float [[TMP40]], i32 24
	; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <8 x float>*			; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <8 x float>*
	; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP39]], <8 x float>* [[TMP51]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !26, !noalias !28			; AVX2-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP39]], <8 x float>* [[TMP51]], i32 4, <8 x i1> [[TMP19]]), !alias.scope !26, !noalias !28
	; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX2-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX2-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX2-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]			; AVX2-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP29:![0-9]+]]
	; AVX2: middle.block:			; AVX2: middle.block:
	; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4, !alias.scope !24			; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <16 x i32>, <16 x i32> [[TMP13]], align 4, !alias.scope !24
	; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48			; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 48
	; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <16 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4, !alias.scope !24			; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <16 x i32>, <16 x i32> [[TMP15]], align 4, !alias.scope !24
	; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP20:%.]] = getelementptr float, float [[B]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP21:%.]] = getelementptr float, float [[B]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP22:%.]] = getelementptr float, float [[B]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP23:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP23:%.]] = getelementptr float, float [[B]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds float, float [[TMP20]], i32 0			; AVX512-NEXT: [[TMP24:%.]] = getelementptr float, float [[TMP20]], i32 0
	; AVX512-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <16 x float>*			; AVX512-NEXT: [[TMP25:%.]] = bitcast float [[TMP24]] to <16 x float>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x float> poison), !alias.scope !27			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP25]], i32 4, <16 x i1> [[TMP16]], <16 x float> poison), !alias.scope !27
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[TMP20]], i32 16			; AVX512-NEXT: [[TMP26:%.]] = getelementptr float, float [[TMP20]], i32 16
	; AVX512-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <16 x float>*			; AVX512-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <16 x float>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x float> poison), !alias.scope !27			; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP27]], i32 4, <16 x i1> [[TMP17]], <16 x float> poison), !alias.scope !27
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds float, float [[TMP20]], i32 32			; AVX512-NEXT: [[TMP28:%.]] = getelementptr float, float [[TMP20]], i32 32
	; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*			; AVX512-NEXT: [[TMP29:%.]] = bitcast float [[TMP28]] to <16 x float>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x float> poison), !alias.scope !27			; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP29]], i32 4, <16 x i1> [[TMP18]], <16 x float> poison), !alias.scope !27
	; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[TMP20]], i32 48			; AVX512-NEXT: [[TMP30:%.]] = getelementptr float, float [[TMP20]], i32 48
	; AVX512-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <16 x float>*			; AVX512-NEXT: [[TMP31:%.]] = bitcast float [[TMP30]] to <16 x float>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x float> poison), !alias.scope !27			; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <16 x float> @llvm.masked.load.v16f32.p0v16f32(<16 x float> [[TMP31]], i32 4, <16 x i1> [[TMP19]], <16 x float> poison), !alias.scope !27
	; AVX512-NEXT: [[TMP32:%.*]] = sitofp <16 x i32> [[WIDE_LOAD]] to <16 x float>			; AVX512-NEXT: [[TMP32:%.*]] = sitofp <16 x i32> [[WIDE_LOAD]] to <16 x float>
	; AVX512-NEXT: [[TMP33:%.*]] = sitofp <16 x i32> [[WIDE_LOAD12]] to <16 x float>			; AVX512-NEXT: [[TMP33:%.*]] = sitofp <16 x i32> [[WIDE_LOAD12]] to <16 x float>
	; AVX512-NEXT: [[TMP34:%.*]] = sitofp <16 x i32> [[WIDE_LOAD13]] to <16 x float>			; AVX512-NEXT: [[TMP34:%.*]] = sitofp <16 x i32> [[WIDE_LOAD13]] to <16 x float>
	; AVX512-NEXT: [[TMP35:%.*]] = sitofp <16 x i32> [[WIDE_LOAD14]] to <16 x float>			; AVX512-NEXT: [[TMP35:%.*]] = sitofp <16 x i32> [[WIDE_LOAD14]] to <16 x float>
	; AVX512-NEXT: [[TMP36:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]			; AVX512-NEXT: [[TMP36:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD]], [[TMP32]]
	; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]			; AVX512-NEXT: [[TMP37:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD15]], [[TMP33]]
	; AVX512-NEXT: [[TMP38:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]			; AVX512-NEXT: [[TMP38:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD16]], [[TMP34]]
	; AVX512-NEXT: [[TMP39:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]			; AVX512-NEXT: [[TMP39:%.*]] = fadd <16 x float> [[WIDE_MASKED_LOAD17]], [[TMP35]]
	; AVX512-NEXT: [[TMP40:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP40:%.]] = getelementptr float, float [[A]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP41:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP41:%.]] = getelementptr float, float [[A]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP42:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP42:%.]] = getelementptr float, float [[A]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP43:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP43:%.]] = getelementptr float, float [[A]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds float, float [[TMP40]], i32 0			; AVX512-NEXT: [[TMP44:%.]] = getelementptr float, float [[TMP40]], i32 0
	; AVX512-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <16 x float>*			; AVX512-NEXT: [[TMP45:%.]] = bitcast float [[TMP44]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP36]], <16 x float>* [[TMP45]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !29, !noalias !31			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP36]], <16 x float>* [[TMP45]], i32 4, <16 x i1> [[TMP16]]), !alias.scope !29, !noalias !31
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds float, float [[TMP40]], i32 16			; AVX512-NEXT: [[TMP46:%.]] = getelementptr float, float [[TMP40]], i32 16
	; AVX512-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <16 x float>*			; AVX512-NEXT: [[TMP47:%.]] = bitcast float [[TMP46]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP47]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !29, !noalias !31			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP37]], <16 x float>* [[TMP47]], i32 4, <16 x i1> [[TMP17]]), !alias.scope !29, !noalias !31
	; AVX512-NEXT: [[TMP48:%.]] = getelementptr inbounds float, float [[TMP40]], i32 32			; AVX512-NEXT: [[TMP48:%.]] = getelementptr float, float [[TMP40]], i32 32
	; AVX512-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <16 x float>*			; AVX512-NEXT: [[TMP49:%.]] = bitcast float [[TMP48]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP38]], <16 x float>* [[TMP49]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !29, !noalias !31			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP38]], <16 x float>* [[TMP49]], i32 4, <16 x i1> [[TMP18]]), !alias.scope !29, !noalias !31
	; AVX512-NEXT: [[TMP50:%.]] = getelementptr inbounds float, float [[TMP40]], i32 48			; AVX512-NEXT: [[TMP50:%.]] = getelementptr float, float [[TMP40]], i32 48
	; AVX512-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <16 x float>*			; AVX512-NEXT: [[TMP51:%.]] = bitcast float [[TMP50]] to <16 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP39]], <16 x float>* [[TMP51]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !29, !noalias !31			; AVX512-NEXT: call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> [[TMP39]], <16 x float>* [[TMP51]], i32 4, <16 x i1> [[TMP19]]), !alias.scope !29, !noalias !31
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 64
	; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP32:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.]], label [[VEC_EPILOG_ITER_CHECK:%.]]
	; AVX512: vec.epilog.iter.check:			; AVX512: vec.epilog.iter.check:
	; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]			; AVX512-NEXT: br i1 false, label [[VEC_EPILOG_SCALAR_PH]], label [[VEC_EPILOG_PH]]
	; AVX512: vec.epilog.ph:			; AVX512: vec.epilog.ph:
	; AVX512-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ 9984, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]			; AVX512-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ 9984, [[VEC_EPILOG_ITER_CHECK]] ], [ 0, [[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
	; AVX512-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]			; AVX512-NEXT: br label [[VEC_EPILOG_VECTOR_BODY:%.*]]
	; AVX512: vec.epilog.vector.body:			; AVX512: vec.epilog.vector.body:
	; AVX512-NEXT: [[INDEX18:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT19:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]			; AVX512-NEXT: [[INDEX18:%.]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT19:%.]], [[VEC_EPILOG_VECTOR_BODY]] ]
	; AVX512-NEXT: [[TMP53:%.*]] = add i64 [[INDEX18]], 0			; AVX512-NEXT: [[TMP53:%.*]] = add i64 [[INDEX18]], 0
	; AVX512-NEXT: [[TMP54:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP53]]			; AVX512-NEXT: [[TMP54:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[TMP53]]
	; AVX512-NEXT: [[TMP55:%.]] = getelementptr inbounds i32, i32 [[TMP54]], i32 0			; AVX512-NEXT: [[TMP55:%.]] = getelementptr inbounds i32, i32 [[TMP54]], i32 0
	; AVX512-NEXT: [[TMP56:%.]] = bitcast i32 [[TMP55]] to <8 x i32>*			; AVX512-NEXT: [[TMP56:%.]] = bitcast i32 [[TMP55]] to <8 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD21:%.]] = load <8 x i32>, <8 x i32> [[TMP56]], align 4			; AVX512-NEXT: [[WIDE_LOAD21:%.]] = load <8 x i32>, <8 x i32> [[TMP56]], align 4
	; AVX512-NEXT: [[TMP57:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD21]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP57:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD21]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP58:%.]] = getelementptr inbounds float, float [[B]], i64 [[TMP53]]			; AVX512-NEXT: [[TMP58:%.]] = getelementptr float, float [[B]], i64 [[TMP53]]
	; AVX512-NEXT: [[TMP59:%.]] = getelementptr inbounds float, float [[TMP58]], i32 0			; AVX512-NEXT: [[TMP59:%.]] = getelementptr float, float [[TMP58]], i32 0
	; AVX512-NEXT: [[TMP60:%.]] = bitcast float [[TMP59]] to <8 x float>*			; AVX512-NEXT: [[TMP60:%.]] = bitcast float [[TMP59]] to <8 x float>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD22:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP60]], i32 4, <8 x i1> [[TMP57]], <8 x float> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD22:%.]] = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float> [[TMP60]], i32 4, <8 x i1> [[TMP57]], <8 x float> poison)
	; AVX512-NEXT: [[TMP61:%.*]] = sitofp <8 x i32> [[WIDE_LOAD21]] to <8 x float>			; AVX512-NEXT: [[TMP61:%.*]] = sitofp <8 x i32> [[WIDE_LOAD21]] to <8 x float>
	; AVX512-NEXT: [[TMP62:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD22]], [[TMP61]]			; AVX512-NEXT: [[TMP62:%.*]] = fadd <8 x float> [[WIDE_MASKED_LOAD22]], [[TMP61]]
	; AVX512-NEXT: [[TMP63:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP53]]			; AVX512-NEXT: [[TMP63:%.]] = getelementptr float, float [[A]], i64 [[TMP53]]
	; AVX512-NEXT: [[TMP64:%.]] = getelementptr inbounds float, float [[TMP63]], i32 0			; AVX512-NEXT: [[TMP64:%.]] = getelementptr float, float [[TMP63]], i32 0
	; AVX512-NEXT: [[TMP65:%.]] = bitcast float [[TMP64]] to <8 x float>*			; AVX512-NEXT: [[TMP65:%.]] = bitcast float [[TMP64]] to <8 x float>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP62]], <8 x float>* [[TMP65]], i32 4, <8 x i1> [[TMP57]])			; AVX512-NEXT: call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> [[TMP62]], <8 x float>* [[TMP65]], i32 4, <8 x i1> [[TMP57]])
	; AVX512-NEXT: [[INDEX_NEXT19]] = add nuw i64 [[INDEX18]], 8			; AVX512-NEXT: [[INDEX_NEXT19]] = add nuw i64 [[INDEX18]], 8
	; AVX512-NEXT: [[TMP66:%.*]] = icmp eq i64 [[INDEX_NEXT19]], 10000			; AVX512-NEXT: [[TMP66:%.*]] = icmp eq i64 [[INDEX_NEXT19]], 10000
	; AVX512-NEXT: br i1 [[TMP66]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP33:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP66]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP33:![0-9]+]]
	; AVX512: vec.epilog.middle.block:			; AVX512: vec.epilog.middle.block:
	; AVX512-NEXT: [[CMP_N20:%.*]] = icmp eq i64 10000, 10000			; AVX512-NEXT: [[CMP_N20:%.*]] = icmp eq i64 10000, 10000
	; AVX512-NEXT: br i1 [[CMP_N20]], label [[FOR_END_LOOPEXIT:%.*]], label [[VEC_EPILOG_SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N20]], label [[FOR_END_LOOPEXIT:%.*]], label [[VEC_EPILOG_SCALAR_PH]]
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[WIDE_LOAD13:%.]] = load <4 x i32>, <4 x i32> [[TMP13]], align 4, !alias.scope !31			; AVX-NEXT: [[WIDE_LOAD13:%.]] = load <4 x i32>, <4 x i32> [[TMP13]], align 4, !alias.scope !31
	; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 12			; AVX-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 12
	; AVX-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*			; AVX-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <4 x i32>*
	; AVX-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP15]], align 4, !alias.scope !31			; AVX-NEXT: [[WIDE_LOAD14:%.]] = load <4 x i32>, <4 x i32> [[TMP15]], align 4, !alias.scope !31
	; AVX-NEXT: [[TMP16:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100>			; AVX-NEXT: [[TMP16:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100>
	; AVX-NEXT: [[TMP17:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100>			; AVX-NEXT: [[TMP17:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100>
	; AVX-NEXT: [[TMP18:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100>			; AVX-NEXT: [[TMP18:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100>
	; AVX-NEXT: [[TMP19:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100>			; AVX-NEXT: [[TMP19:%.*]] = icmp slt <4 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100>
	; AVX-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP0]]			; AVX-NEXT: [[TMP20:%.]] = getelementptr double, double [[B]], i64 [[TMP0]]
	; AVX-NEXT: [[TMP21:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP1]]			; AVX-NEXT: [[TMP21:%.]] = getelementptr double, double [[B]], i64 [[TMP1]]
	; AVX-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP2]]			; AVX-NEXT: [[TMP22:%.]] = getelementptr double, double [[B]], i64 [[TMP2]]
	; AVX-NEXT: [[TMP23:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP3]]			; AVX-NEXT: [[TMP23:%.]] = getelementptr double, double [[B]], i64 [[TMP3]]
	; AVX-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[TMP20]], i32 0			; AVX-NEXT: [[TMP24:%.]] = getelementptr double, double [[TMP20]], i32 0
	; AVX-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <4 x double>*			; AVX-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <4 x double>*
	; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP25]], i32 8, <4 x i1> [[TMP16]], <4 x double> poison), !alias.scope !34			; AVX-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP25]], i32 8, <4 x i1> [[TMP16]], <4 x double> poison), !alias.scope !34
	; AVX-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[TMP20]], i32 4			; AVX-NEXT: [[TMP26:%.]] = getelementptr double, double [[TMP20]], i32 4
	; AVX-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <4 x double>*			; AVX-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <4 x double>*
	; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP27]], i32 8, <4 x i1> [[TMP17]], <4 x double> poison), !alias.scope !34			; AVX-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP27]], i32 8, <4 x i1> [[TMP17]], <4 x double> poison), !alias.scope !34
	; AVX-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[TMP20]], i32 8			; AVX-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP20]], i32 8
	; AVX-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <4 x double>*			; AVX-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <4 x double>*
	; AVX-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP29]], i32 8, <4 x i1> [[TMP18]], <4 x double> poison), !alias.scope !34			; AVX-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP29]], i32 8, <4 x i1> [[TMP18]], <4 x double> poison), !alias.scope !34
	; AVX-NEXT: [[TMP30:%.]] = getelementptr inbounds double, double [[TMP20]], i32 12			; AVX-NEXT: [[TMP30:%.]] = getelementptr double, double [[TMP20]], i32 12
	; AVX-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <4 x double>*			; AVX-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <4 x double>*
	; AVX-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP31]], i32 8, <4 x i1> [[TMP19]], <4 x double> poison), !alias.scope !34			; AVX-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP31]], i32 8, <4 x i1> [[TMP19]], <4 x double> poison), !alias.scope !34
	; AVX-NEXT: [[TMP32:%.*]] = sitofp <4 x i32> [[WIDE_LOAD]] to <4 x double>			; AVX-NEXT: [[TMP32:%.*]] = sitofp <4 x i32> [[WIDE_LOAD]] to <4 x double>
	; AVX-NEXT: [[TMP33:%.*]] = sitofp <4 x i32> [[WIDE_LOAD12]] to <4 x double>			; AVX-NEXT: [[TMP33:%.*]] = sitofp <4 x i32> [[WIDE_LOAD12]] to <4 x double>
	; AVX-NEXT: [[TMP34:%.*]] = sitofp <4 x i32> [[WIDE_LOAD13]] to <4 x double>			; AVX-NEXT: [[TMP34:%.*]] = sitofp <4 x i32> [[WIDE_LOAD13]] to <4 x double>
	; AVX-NEXT: [[TMP35:%.*]] = sitofp <4 x i32> [[WIDE_LOAD14]] to <4 x double>			; AVX-NEXT: [[TMP35:%.*]] = sitofp <4 x i32> [[WIDE_LOAD14]] to <4 x double>
	; AVX-NEXT: [[TMP36:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]			; AVX-NEXT: [[TMP36:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]
	; AVX-NEXT: [[TMP37:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]			; AVX-NEXT: [[TMP37:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]
	; AVX-NEXT: [[TMP38:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]			; AVX-NEXT: [[TMP38:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]
	; AVX-NEXT: [[TMP39:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]			; AVX-NEXT: [[TMP39:%.*]] = fadd <4 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]
	; AVX-NEXT: [[TMP40:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP0]]			; AVX-NEXT: [[TMP40:%.]] = getelementptr double, double [[A]], i64 [[TMP0]]
	; AVX-NEXT: [[TMP41:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP1]]			; AVX-NEXT: [[TMP41:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]
	; AVX-NEXT: [[TMP42:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP2]]			; AVX-NEXT: [[TMP42:%.]] = getelementptr double, double [[A]], i64 [[TMP2]]
	; AVX-NEXT: [[TMP43:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP3]]			; AVX-NEXT: [[TMP43:%.]] = getelementptr double, double [[A]], i64 [[TMP3]]
	; AVX-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[TMP40]], i32 0			; AVX-NEXT: [[TMP44:%.]] = getelementptr double, double [[TMP40]], i32 0
	; AVX-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <4 x double>*			; AVX-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP36]], <4 x double>* [[TMP45]], i32 8, <4 x i1> [[TMP16]]), !alias.scope !36, !noalias !38			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP36]], <4 x double>* [[TMP45]], i32 8, <4 x i1> [[TMP16]]), !alias.scope !36, !noalias !38
	; AVX-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[TMP40]], i32 4			; AVX-NEXT: [[TMP46:%.]] = getelementptr double, double [[TMP40]], i32 4
	; AVX-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <4 x double>*			; AVX-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP37]], <4 x double>* [[TMP47]], i32 8, <4 x i1> [[TMP17]]), !alias.scope !36, !noalias !38			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP37]], <4 x double>* [[TMP47]], i32 8, <4 x i1> [[TMP17]]), !alias.scope !36, !noalias !38
	; AVX-NEXT: [[TMP48:%.]] = getelementptr inbounds double, double [[TMP40]], i32 8			; AVX-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP40]], i32 8
	; AVX-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <4 x double>*			; AVX-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP38]], <4 x double>* [[TMP49]], i32 8, <4 x i1> [[TMP18]]), !alias.scope !36, !noalias !38			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP38]], <4 x double>* [[TMP49]], i32 8, <4 x i1> [[TMP18]]), !alias.scope !36, !noalias !38
	; AVX-NEXT: [[TMP50:%.]] = getelementptr inbounds double, double [[TMP40]], i32 12			; AVX-NEXT: [[TMP50:%.]] = getelementptr double, double [[TMP40]], i32 12
	; AVX-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <4 x double>*			; AVX-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <4 x double>*
	; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP39]], <4 x double>* [[TMP51]], i32 8, <4 x i1> [[TMP19]]), !alias.scope !36, !noalias !38			; AVX-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[TMP39]], <4 x double>* [[TMP51]], i32 8, <4 x i1> [[TMP19]]), !alias.scope !36, !noalias !38
	; AVX-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; AVX-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; AVX-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; AVX-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	; AVX-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]			; AVX-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP39:![0-9]+]]
	; AVX: middle.block:			; AVX: middle.block:
	; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000			; AVX-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 10000
	; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !35			; AVX512-NEXT: [[WIDE_LOAD13:%.]] = load <8 x i32>, <8 x i32> [[TMP13]], align 4, !alias.scope !35
	; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24			; AVX512-NEXT: [[TMP14:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i32 24
	; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*			; AVX512-NEXT: [[TMP15:%.]] = bitcast i32 [[TMP14]] to <8 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !35			; AVX512-NEXT: [[WIDE_LOAD14:%.]] = load <8 x i32>, <8 x i32> [[TMP15]], align 4, !alias.scope !35
	; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP16:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD12]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP18:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD13]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP19:%.*]] = icmp slt <8 x i32> [[WIDE_LOAD14]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP20:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP20:%.]] = getelementptr double, double [[B]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP21:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP21:%.]] = getelementptr double, double [[B]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP22:%.]] = getelementptr double, double [[B]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP23:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP23:%.]] = getelementptr double, double [[B]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[TMP20]], i32 0			; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double [[TMP20]], i32 0
	; AVX512-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <8 x double>*			; AVX512-NEXT: [[TMP25:%.]] = bitcast double [[TMP24]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP25]], i32 8, <8 x i1> [[TMP16]], <8 x double> poison), !alias.scope !38			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP25]], i32 8, <8 x i1> [[TMP16]], <8 x double> poison), !alias.scope !38
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[TMP20]], i32 8			; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double [[TMP20]], i32 8
	; AVX512-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <8 x double>*			; AVX512-NEXT: [[TMP27:%.]] = bitcast double [[TMP26]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP27]], i32 8, <8 x i1> [[TMP17]], <8 x double> poison), !alias.scope !38			; AVX512-NEXT: [[WIDE_MASKED_LOAD15:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP27]], i32 8, <8 x i1> [[TMP17]], <8 x double> poison), !alias.scope !38
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[TMP20]], i32 16			; AVX512-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP20]], i32 16
	; AVX512-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <8 x double>*			; AVX512-NEXT: [[TMP29:%.]] = bitcast double [[TMP28]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP29]], i32 8, <8 x i1> [[TMP18]], <8 x double> poison), !alias.scope !38			; AVX512-NEXT: [[WIDE_MASKED_LOAD16:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP29]], i32 8, <8 x i1> [[TMP18]], <8 x double> poison), !alias.scope !38
	; AVX512-NEXT: [[TMP30:%.]] = getelementptr inbounds double, double [[TMP20]], i32 24			; AVX512-NEXT: [[TMP30:%.]] = getelementptr double, double [[TMP20]], i32 24
	; AVX512-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <8 x double>*			; AVX512-NEXT: [[TMP31:%.]] = bitcast double [[TMP30]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP31]], i32 8, <8 x i1> [[TMP19]], <8 x double> poison), !alias.scope !38			; AVX512-NEXT: [[WIDE_MASKED_LOAD17:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP31]], i32 8, <8 x i1> [[TMP19]], <8 x double> poison), !alias.scope !38
	; AVX512-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x double>			; AVX512-NEXT: [[TMP32:%.*]] = sitofp <8 x i32> [[WIDE_LOAD]] to <8 x double>
	; AVX512-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x double>			; AVX512-NEXT: [[TMP33:%.*]] = sitofp <8 x i32> [[WIDE_LOAD12]] to <8 x double>
	; AVX512-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x double>			; AVX512-NEXT: [[TMP34:%.*]] = sitofp <8 x i32> [[WIDE_LOAD13]] to <8 x double>
	; AVX512-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x double>			; AVX512-NEXT: [[TMP35:%.*]] = sitofp <8 x i32> [[WIDE_LOAD14]] to <8 x double>
	; AVX512-NEXT: [[TMP36:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]			; AVX512-NEXT: [[TMP36:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD]], [[TMP32]]
	; AVX512-NEXT: [[TMP37:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]			; AVX512-NEXT: [[TMP37:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD15]], [[TMP33]]
	; AVX512-NEXT: [[TMP38:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]			; AVX512-NEXT: [[TMP38:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD16]], [[TMP34]]
	; AVX512-NEXT: [[TMP39:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]			; AVX512-NEXT: [[TMP39:%.*]] = fadd <8 x double> [[WIDE_MASKED_LOAD17]], [[TMP35]]
	; AVX512-NEXT: [[TMP40:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP40:%.]] = getelementptr double, double [[A]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP41:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP41:%.]] = getelementptr double, double [[A]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP42:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP42:%.]] = getelementptr double, double [[A]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP43:%.]] = getelementptr inbounds double, double [[A]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP43:%.]] = getelementptr double, double [[A]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[TMP40]], i32 0			; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[TMP40]], i32 0
	; AVX512-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <8 x double>*			; AVX512-NEXT: [[TMP45:%.]] = bitcast double [[TMP44]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP36]], <8 x double>* [[TMP45]], i32 8, <8 x i1> [[TMP16]]), !alias.scope !40, !noalias !42			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP36]], <8 x double>* [[TMP45]], i32 8, <8 x i1> [[TMP16]]), !alias.scope !40, !noalias !42
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[TMP40]], i32 8			; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[TMP40]], i32 8
	; AVX512-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <8 x double>*			; AVX512-NEXT: [[TMP47:%.]] = bitcast double [[TMP46]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP37]], <8 x double>* [[TMP47]], i32 8, <8 x i1> [[TMP17]]), !alias.scope !40, !noalias !42			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP37]], <8 x double>* [[TMP47]], i32 8, <8 x i1> [[TMP17]]), !alias.scope !40, !noalias !42
	; AVX512-NEXT: [[TMP48:%.]] = getelementptr inbounds double, double [[TMP40]], i32 16			; AVX512-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP40]], i32 16
	; AVX512-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <8 x double>*			; AVX512-NEXT: [[TMP49:%.]] = bitcast double [[TMP48]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP38]], <8 x double>* [[TMP49]], i32 8, <8 x i1> [[TMP18]]), !alias.scope !40, !noalias !42			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP38]], <8 x double>* [[TMP49]], i32 8, <8 x i1> [[TMP18]]), !alias.scope !40, !noalias !42
	; AVX512-NEXT: [[TMP50:%.]] = getelementptr inbounds double, double [[TMP40]], i32 24			; AVX512-NEXT: [[TMP50:%.]] = getelementptr double, double [[TMP40]], i32 24
	; AVX512-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <8 x double>*			; AVX512-NEXT: [[TMP51:%.]] = bitcast double [[TMP50]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP39]], <8 x double>* [[TMP51]], i32 8, <8 x i1> [[TMP19]]), !alias.scope !40, !noalias !42			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[TMP39]], <8 x double>* [[TMP51]], i32 8, <8 x i1> [[TMP19]]), !alias.scope !40, !noalias !42
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984			; AVX512-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 9984
	; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP43:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 10000, 9984
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -3			; AVX2-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -3
	; AVX2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*			; AVX2-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <4 x i32>*
	; AVX2-NEXT: [[WIDE_LOAD16:%.]] = load <4 x i32>, <4 x i32> [[TMP19]], align 4, !alias.scope !41			; AVX2-NEXT: [[WIDE_LOAD16:%.]] = load <4 x i32>, <4 x i32> [[TMP19]], align 4, !alias.scope !41
	; AVX2-NEXT: [[REVERSE17:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD16]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE17:%.*]] = shufflevector <4 x i32> [[WIDE_LOAD16]], <4 x i32> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP20:%.*]] = icmp sgt <4 x i32> [[REVERSE]], zeroinitializer			; AVX2-NEXT: [[TMP20:%.*]] = icmp sgt <4 x i32> [[REVERSE]], zeroinitializer
	; AVX2-NEXT: [[TMP21:%.*]] = icmp sgt <4 x i32> [[REVERSE13]], zeroinitializer			; AVX2-NEXT: [[TMP21:%.*]] = icmp sgt <4 x i32> [[REVERSE13]], zeroinitializer
	; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt <4 x i32> [[REVERSE15]], zeroinitializer			; AVX2-NEXT: [[TMP22:%.*]] = icmp sgt <4 x i32> [[REVERSE15]], zeroinitializer
	; AVX2-NEXT: [[TMP23:%.*]] = icmp sgt <4 x i32> [[REVERSE17]], zeroinitializer			; AVX2-NEXT: [[TMP23:%.*]] = icmp sgt <4 x i32> [[REVERSE17]], zeroinitializer
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP24:%.]] = getelementptr double, double [[IN]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP25:%.]] = getelementptr double, double [[IN]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP26:%.]] = getelementptr double, double [[IN]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP27:%.]] = getelementptr double, double [[IN]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[TMP24]], i32 0			; AVX2-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP24]], i32 0
	; AVX2-NEXT: [[TMP29:%.]] = getelementptr inbounds double, double [[TMP28]], i32 -3			; AVX2-NEXT: [[TMP29:%.]] = getelementptr double, double [[TMP28]], i32 -3
	; AVX2-NEXT: [[REVERSE18:%.*]] = shufflevector <4 x i1> [[TMP20]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE18:%.*]] = shufflevector <4 x i1> [[TMP20]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*			; AVX2-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <4 x double>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP30]], i32 8, <4 x i1> [[REVERSE18]], <4 x double> poison), !alias.scope !44			; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP30]], i32 8, <4 x i1> [[REVERSE18]], <4 x double> poison), !alias.scope !44
	; AVX2-NEXT: [[REVERSE19:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE19:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP31:%.]] = getelementptr inbounds double, double [[TMP24]], i32 -4			; AVX2-NEXT: [[TMP31:%.]] = getelementptr double, double [[TMP24]], i32 -4
	; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[TMP31]], i32 -3			; AVX2-NEXT: [[TMP32:%.]] = getelementptr double, double [[TMP31]], i32 -3
	; AVX2-NEXT: [[REVERSE20:%.*]] = shufflevector <4 x i1> [[TMP21]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE20:%.*]] = shufflevector <4 x i1> [[TMP21]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*			; AVX2-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <4 x double>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[REVERSE20]], <4 x double> poison), !alias.scope !44			; AVX2-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[REVERSE20]], <4 x double> poison), !alias.scope !44
	; AVX2-NEXT: [[REVERSE22:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD21]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE22:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD21]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP24]], i32 -8			; AVX2-NEXT: [[TMP34:%.]] = getelementptr double, double [[TMP24]], i32 -8
	; AVX2-NEXT: [[TMP35:%.]] = getelementptr inbounds double, double [[TMP34]], i32 -3			; AVX2-NEXT: [[TMP35:%.]] = getelementptr double, double [[TMP34]], i32 -3
	; AVX2-NEXT: [[REVERSE23:%.*]] = shufflevector <4 x i1> [[TMP22]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE23:%.*]] = shufflevector <4 x i1> [[TMP22]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <4 x double>*			; AVX2-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <4 x double>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP36]], i32 8, <4 x i1> [[REVERSE23]], <4 x double> poison), !alias.scope !44			; AVX2-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP36]], i32 8, <4 x i1> [[REVERSE23]], <4 x double> poison), !alias.scope !44
	; AVX2-NEXT: [[REVERSE25:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD24]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE25:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD24]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP37:%.]] = getelementptr inbounds double, double [[TMP24]], i32 -12			; AVX2-NEXT: [[TMP37:%.]] = getelementptr double, double [[TMP24]], i32 -12
	; AVX2-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double [[TMP37]], i32 -3			; AVX2-NEXT: [[TMP38:%.]] = getelementptr double, double [[TMP37]], i32 -3
	; AVX2-NEXT: [[REVERSE26:%.*]] = shufflevector <4 x i1> [[TMP23]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE26:%.*]] = shufflevector <4 x i1> [[TMP23]], <4 x i1> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <4 x double>*			; AVX2-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <4 x double>*
	; AVX2-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[REVERSE26]], <4 x double> poison), !alias.scope !44			; AVX2-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <4 x double> @llvm.masked.load.v4f64.p0v4f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[REVERSE26]], <4 x double> poison), !alias.scope !44
	; AVX2-NEXT: [[REVERSE28:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD27]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE28:%.*]] = shufflevector <4 x double> [[WIDE_MASKED_LOAD27]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP40:%.*]] = fadd <4 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX2-NEXT: [[TMP40:%.*]] = fadd <4 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX2-NEXT: [[TMP41:%.*]] = fadd <4 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX2-NEXT: [[TMP41:%.*]] = fadd <4 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX2-NEXT: [[TMP42:%.*]] = fadd <4 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX2-NEXT: [[TMP42:%.*]] = fadd <4 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX2-NEXT: [[TMP43:%.*]] = fadd <4 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX2-NEXT: [[TMP43:%.*]] = fadd <4 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX2-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX2-NEXT: [[REVERSE29:%.*]] = shufflevector <4 x double> [[TMP40]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE29:%.*]] = shufflevector <4 x double> [[TMP40]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP48:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX2-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX2-NEXT: [[TMP49:%.]] = getelementptr inbounds double, double [[TMP48]], i32 -3			; AVX2-NEXT: [[TMP49:%.]] = getelementptr double, double [[TMP48]], i32 -3
	; AVX2-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <4 x double>*			; AVX2-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE29]], <4 x double>* [[TMP50]], i32 8, <4 x i1> [[REVERSE18]]), !alias.scope !46, !noalias !48			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE29]], <4 x double>* [[TMP50]], i32 8, <4 x i1> [[REVERSE18]]), !alias.scope !46, !noalias !48
	; AVX2-NEXT: [[REVERSE31:%.*]] = shufflevector <4 x double> [[TMP41]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE31:%.*]] = shufflevector <4 x double> [[TMP41]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP51:%.]] = getelementptr inbounds double, double [[TMP44]], i32 -4			; AVX2-NEXT: [[TMP51:%.]] = getelementptr double, double [[TMP44]], i32 -4
	; AVX2-NEXT: [[TMP52:%.]] = getelementptr inbounds double, double [[TMP51]], i32 -3			; AVX2-NEXT: [[TMP52:%.]] = getelementptr double, double [[TMP51]], i32 -3
	; AVX2-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <4 x double>*			; AVX2-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE31]], <4 x double>* [[TMP53]], i32 8, <4 x i1> [[REVERSE20]]), !alias.scope !46, !noalias !48			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE31]], <4 x double>* [[TMP53]], i32 8, <4 x i1> [[REVERSE20]]), !alias.scope !46, !noalias !48
	; AVX2-NEXT: [[REVERSE33:%.*]] = shufflevector <4 x double> [[TMP42]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE33:%.*]] = shufflevector <4 x double> [[TMP42]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP54:%.]] = getelementptr inbounds double, double [[TMP44]], i32 -8			; AVX2-NEXT: [[TMP54:%.]] = getelementptr double, double [[TMP44]], i32 -8
	; AVX2-NEXT: [[TMP55:%.]] = getelementptr inbounds double, double [[TMP54]], i32 -3			; AVX2-NEXT: [[TMP55:%.]] = getelementptr double, double [[TMP54]], i32 -3
	; AVX2-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <4 x double>*			; AVX2-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE33]], <4 x double>* [[TMP56]], i32 8, <4 x i1> [[REVERSE23]]), !alias.scope !46, !noalias !48			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE33]], <4 x double>* [[TMP56]], i32 8, <4 x i1> [[REVERSE23]]), !alias.scope !46, !noalias !48
	; AVX2-NEXT: [[REVERSE35:%.*]] = shufflevector <4 x double> [[TMP43]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; AVX2-NEXT: [[REVERSE35:%.*]] = shufflevector <4 x double> [[TMP43]], <4 x double> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; AVX2-NEXT: [[TMP57:%.]] = getelementptr inbounds double, double [[TMP44]], i32 -12			; AVX2-NEXT: [[TMP57:%.]] = getelementptr double, double [[TMP44]], i32 -12
	; AVX2-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP57]], i32 -3			; AVX2-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP57]], i32 -3
	; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*			; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE35]], <4 x double>* [[TMP59]], i32 8, <4 x i1> [[REVERSE26]]), !alias.scope !46, !noalias !48			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> [[REVERSE35]], <4 x double>* [[TMP59]], i32 8, <4 x i1> [[REVERSE26]]), !alias.scope !46, !noalias !48
	; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; AVX2-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; AVX2-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; AVX2-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP49:![0-9]+]]			; AVX2-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP49:![0-9]+]]
	; AVX2: middle.block:			; AVX2: middle.block:
	; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -7			; AVX512-NEXT: [[TMP18:%.]] = getelementptr inbounds i32, i32 [[TMP17]], i32 -7
	; AVX512-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <8 x i32>*			; AVX512-NEXT: [[TMP19:%.]] = bitcast i32 [[TMP18]] to <8 x i32>*
	; AVX512-NEXT: [[WIDE_LOAD16:%.]] = load <8 x i32>, <8 x i32> [[TMP19]], align 4, !alias.scope !55			; AVX512-NEXT: [[WIDE_LOAD16:%.]] = load <8 x i32>, <8 x i32> [[TMP19]], align 4, !alias.scope !55
	; AVX512-NEXT: [[REVERSE17:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD16]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE17:%.*]] = shufflevector <8 x i32> [[WIDE_LOAD16]], <8 x i32> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP20:%.*]] = icmp sgt <8 x i32> [[REVERSE]], zeroinitializer			; AVX512-NEXT: [[TMP20:%.*]] = icmp sgt <8 x i32> [[REVERSE]], zeroinitializer
	; AVX512-NEXT: [[TMP21:%.*]] = icmp sgt <8 x i32> [[REVERSE13]], zeroinitializer			; AVX512-NEXT: [[TMP21:%.*]] = icmp sgt <8 x i32> [[REVERSE13]], zeroinitializer
	; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <8 x i32> [[REVERSE15]], zeroinitializer			; AVX512-NEXT: [[TMP22:%.*]] = icmp sgt <8 x i32> [[REVERSE15]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.*]] = icmp sgt <8 x i32> [[REVERSE17]], zeroinitializer			; AVX512-NEXT: [[TMP23:%.*]] = icmp sgt <8 x i32> [[REVERSE17]], zeroinitializer
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double [[IN]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP25:%.]] = getelementptr double, double [[IN]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double [[IN]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double [[IN]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP27:%.]] = getelementptr double, double [[IN]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP28:%.]] = getelementptr inbounds double, double [[TMP24]], i32 0			; AVX512-NEXT: [[TMP28:%.]] = getelementptr double, double [[TMP24]], i32 0
	; AVX512-NEXT: [[TMP29:%.]] = getelementptr inbounds double, double [[TMP28]], i32 -7			; AVX512-NEXT: [[TMP29:%.]] = getelementptr double, double [[TMP28]], i32 -7
	; AVX512-NEXT: [[REVERSE18:%.*]] = shufflevector <8 x i1> [[TMP20]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE18:%.*]] = shufflevector <8 x i1> [[TMP20]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <8 x double>*			; AVX512-NEXT: [[TMP30:%.]] = bitcast double [[TMP29]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP30]], i32 8, <8 x i1> [[REVERSE18]], <8 x double> poison), !alias.scope !58			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP30]], i32 8, <8 x i1> [[REVERSE18]], <8 x double> poison), !alias.scope !58
	; AVX512-NEXT: [[REVERSE19:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE19:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP31:%.]] = getelementptr inbounds double, double [[TMP24]], i32 -8			; AVX512-NEXT: [[TMP31:%.]] = getelementptr double, double [[TMP24]], i32 -8
	; AVX512-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double [[TMP31]], i32 -7			; AVX512-NEXT: [[TMP32:%.]] = getelementptr double, double [[TMP31]], i32 -7
	; AVX512-NEXT: [[REVERSE20:%.*]] = shufflevector <8 x i1> [[TMP21]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE20:%.*]] = shufflevector <8 x i1> [[TMP21]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <8 x double>*			; AVX512-NEXT: [[TMP33:%.]] = bitcast double [[TMP32]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP33]], i32 8, <8 x i1> [[REVERSE20]], <8 x double> poison), !alias.scope !58			; AVX512-NEXT: [[WIDE_MASKED_LOAD21:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP33]], i32 8, <8 x i1> [[REVERSE20]], <8 x double> poison), !alias.scope !58
	; AVX512-NEXT: [[REVERSE22:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD21]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE22:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD21]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double [[TMP24]], i32 -16			; AVX512-NEXT: [[TMP34:%.]] = getelementptr double, double [[TMP24]], i32 -16
	; AVX512-NEXT: [[TMP35:%.]] = getelementptr inbounds double, double [[TMP34]], i32 -7			; AVX512-NEXT: [[TMP35:%.]] = getelementptr double, double [[TMP34]], i32 -7
	; AVX512-NEXT: [[REVERSE23:%.*]] = shufflevector <8 x i1> [[TMP22]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE23:%.*]] = shufflevector <8 x i1> [[TMP22]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <8 x double>*			; AVX512-NEXT: [[TMP36:%.]] = bitcast double [[TMP35]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP36]], i32 8, <8 x i1> [[REVERSE23]], <8 x double> poison), !alias.scope !58			; AVX512-NEXT: [[WIDE_MASKED_LOAD24:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP36]], i32 8, <8 x i1> [[REVERSE23]], <8 x double> poison), !alias.scope !58
	; AVX512-NEXT: [[REVERSE25:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD24]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE25:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD24]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP37:%.]] = getelementptr inbounds double, double [[TMP24]], i32 -24			; AVX512-NEXT: [[TMP37:%.]] = getelementptr double, double [[TMP24]], i32 -24
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double [[TMP37]], i32 -7			; AVX512-NEXT: [[TMP38:%.]] = getelementptr double, double [[TMP37]], i32 -7
	; AVX512-NEXT: [[REVERSE26:%.*]] = shufflevector <8 x i1> [[TMP23]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE26:%.*]] = shufflevector <8 x i1> [[TMP23]], <8 x i1> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <8 x double>*			; AVX512-NEXT: [[TMP39:%.]] = bitcast double [[TMP38]] to <8 x double>*
	; AVX512-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP39]], i32 8, <8 x i1> [[REVERSE26]], <8 x double> poison), !alias.scope !58			; AVX512-NEXT: [[WIDE_MASKED_LOAD27:%.]] = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double> [[TMP39]], i32 8, <8 x i1> [[REVERSE26]], <8 x double> poison), !alias.scope !58
	; AVX512-NEXT: [[REVERSE28:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD27]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE28:%.*]] = shufflevector <8 x double> [[WIDE_MASKED_LOAD27]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP40:%.*]] = fadd <8 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX512-NEXT: [[TMP40:%.*]] = fadd <8 x double> [[REVERSE19]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX512-NEXT: [[TMP41:%.*]] = fadd <8 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX512-NEXT: [[TMP41:%.*]] = fadd <8 x double> [[REVERSE22]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX512-NEXT: [[TMP42:%.*]] = fadd <8 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX512-NEXT: [[TMP42:%.*]] = fadd <8 x double> [[REVERSE25]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX512-NEXT: [[TMP43:%.*]] = fadd <8 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>			; AVX512-NEXT: [[TMP43:%.*]] = fadd <8 x double> [[REVERSE28]], <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX512-NEXT: [[REVERSE29:%.*]] = shufflevector <8 x double> [[TMP40]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE29:%.*]] = shufflevector <8 x double> [[TMP40]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP48:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX512-NEXT: [[TMP48:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX512-NEXT: [[TMP49:%.]] = getelementptr inbounds double, double [[TMP48]], i32 -7			; AVX512-NEXT: [[TMP49:%.]] = getelementptr double, double [[TMP48]], i32 -7
	; AVX512-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <8 x double>*			; AVX512-NEXT: [[TMP50:%.]] = bitcast double [[TMP49]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE29]], <8 x double>* [[TMP50]], i32 8, <8 x i1> [[REVERSE18]]), !alias.scope !60, !noalias !62			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE29]], <8 x double>* [[TMP50]], i32 8, <8 x i1> [[REVERSE18]]), !alias.scope !60, !noalias !62
	; AVX512-NEXT: [[REVERSE31:%.*]] = shufflevector <8 x double> [[TMP41]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE31:%.*]] = shufflevector <8 x double> [[TMP41]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP51:%.]] = getelementptr inbounds double, double [[TMP44]], i32 -8			; AVX512-NEXT: [[TMP51:%.]] = getelementptr double, double [[TMP44]], i32 -8
	; AVX512-NEXT: [[TMP52:%.]] = getelementptr inbounds double, double [[TMP51]], i32 -7			; AVX512-NEXT: [[TMP52:%.]] = getelementptr double, double [[TMP51]], i32 -7
	; AVX512-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <8 x double>*			; AVX512-NEXT: [[TMP53:%.]] = bitcast double [[TMP52]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE31]], <8 x double>* [[TMP53]], i32 8, <8 x i1> [[REVERSE20]]), !alias.scope !60, !noalias !62			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE31]], <8 x double>* [[TMP53]], i32 8, <8 x i1> [[REVERSE20]]), !alias.scope !60, !noalias !62
	; AVX512-NEXT: [[REVERSE33:%.*]] = shufflevector <8 x double> [[TMP42]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE33:%.*]] = shufflevector <8 x double> [[TMP42]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP54:%.]] = getelementptr inbounds double, double [[TMP44]], i32 -16			; AVX512-NEXT: [[TMP54:%.]] = getelementptr double, double [[TMP44]], i32 -16
	; AVX512-NEXT: [[TMP55:%.]] = getelementptr inbounds double, double [[TMP54]], i32 -7			; AVX512-NEXT: [[TMP55:%.]] = getelementptr double, double [[TMP54]], i32 -7
	; AVX512-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <8 x double>*			; AVX512-NEXT: [[TMP56:%.]] = bitcast double [[TMP55]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE33]], <8 x double>* [[TMP56]], i32 8, <8 x i1> [[REVERSE23]]), !alias.scope !60, !noalias !62			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE33]], <8 x double>* [[TMP56]], i32 8, <8 x i1> [[REVERSE23]]), !alias.scope !60, !noalias !62
	; AVX512-NEXT: [[REVERSE35:%.*]] = shufflevector <8 x double> [[TMP43]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>			; AVX512-NEXT: [[REVERSE35:%.*]] = shufflevector <8 x double> [[TMP43]], <8 x double> poison, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
	; AVX512-NEXT: [[TMP57:%.]] = getelementptr inbounds double, double [[TMP44]], i32 -24			; AVX512-NEXT: [[TMP57:%.]] = getelementptr double, double [[TMP44]], i32 -24
	; AVX512-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP57]], i32 -7			; AVX512-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP57]], i32 -7
	; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*			; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE35]], <8 x double>* [[TMP59]], i32 8, <8 x i1> [[REVERSE26]]), !alias.scope !60, !noalias !62			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> [[REVERSE35]], <8 x double>* [[TMP59]], i32 8, <8 x i1> [[REVERSE26]]), !alias.scope !60, !noalias !62
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX512-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; AVX512-NEXT: [[TMP60:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; AVX512-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP63:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP60]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP63:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer			; AVX1-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer
	; AVX1-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer			; AVX1-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer
	; AVX1-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer			; AVX1-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer
	; AVX1-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer			; AVX1-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer
	; AVX1-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP24:%.]] = getelementptr double, double** [[IN:%.*]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP1]]			; AVX1-NEXT: [[TMP25:%.]] = getelementptr double, double** [[IN]], i64 [[TMP1]]
	; AVX1-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP2]]			; AVX1-NEXT: [[TMP26:%.]] = getelementptr double, double** [[IN]], i64 [[TMP2]]
	; AVX1-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP3]]			; AVX1-NEXT: [[TMP27:%.]] = getelementptr double, double** [[IN]], i64 [[TMP3]]
	; AVX1-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 0			; AVX1-NEXT: [[TMP32:%.]] = getelementptr double, double** [[TMP24]], i32 0
	; AVX1-NEXT: [[TMP33:%.]] = bitcast double* [[TMP32]] to <4 x double>			; AVX1-NEXT: [[TMP33:%.]] = bitcast double* [[TMP32]] to <4 x double>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x double*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x double*> poison)
	; AVX1-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 4			; AVX1-NEXT: [[TMP34:%.]] = getelementptr double, double** [[TMP24]], i32 4
	; AVX1-NEXT: [[TMP35:%.]] = bitcast double* [[TMP34]] to <4 x double>			; AVX1-NEXT: [[TMP35:%.]] = bitcast double* [[TMP34]] to <4 x double>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x double*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x double*> poison)
	; AVX1-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 8			; AVX1-NEXT: [[TMP36:%.]] = getelementptr double, double** [[TMP24]], i32 8
	; AVX1-NEXT: [[TMP37:%.]] = bitcast double* [[TMP36]] to <4 x double>			; AVX1-NEXT: [[TMP37:%.]] = bitcast double* [[TMP36]] to <4 x double>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x double*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x double*> poison)
	; AVX1-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 12			; AVX1-NEXT: [[TMP38:%.]] = getelementptr double, double** [[TMP24]], i32 12
	; AVX1-NEXT: [[TMP39:%.]] = bitcast double* [[TMP38]] to <4 x double>			; AVX1-NEXT: [[TMP39:%.]] = bitcast double* [[TMP38]] to <4 x double>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x double*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x double*> poison)
	; AVX1-NEXT: [[TMP40:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX1-NEXT: [[TMP40:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX1-NEXT: [[TMP41:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD4]], zeroinitializer			; AVX1-NEXT: [[TMP41:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD4]], zeroinitializer
	; AVX1-NEXT: [[TMP42:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD5]], zeroinitializer			; AVX1-NEXT: [[TMP42:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD5]], zeroinitializer
	; AVX1-NEXT: [[TMP43:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD6]], zeroinitializer			; AVX1-NEXT: [[TMP43:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD6]], zeroinitializer
	; AVX1-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX1-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX1-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX1-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX1-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX1-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX1-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP56:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX1-NEXT: [[TMP56:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX1-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*			; AVX1-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])
	; AVX1-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP44]], i32 4			; AVX1-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP44]], i32 4
	; AVX1-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*			; AVX1-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])
	; AVX1-NEXT: [[TMP60:%.]] = getelementptr inbounds double, double [[TMP44]], i32 8			; AVX1-NEXT: [[TMP60:%.]] = getelementptr double, double [[TMP44]], i32 8
	; AVX1-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*			; AVX1-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])
	; AVX1-NEXT: [[TMP62:%.]] = getelementptr inbounds double, double [[TMP44]], i32 12			; AVX1-NEXT: [[TMP62:%.]] = getelementptr double, double [[TMP44]], i32 12
	; AVX1-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*			; AVX1-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])
	; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; AVX1-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX1-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX1-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]			; AVX1-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP41:![0-9]+]]
	; AVX1: middle.block:			; AVX1: middle.block:
	; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer			; AVX2-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer
	; AVX2-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer			; AVX2-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer
	; AVX2-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer			; AVX2-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer
	; AVX2-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer			; AVX2-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP24:%.]] = getelementptr double, double** [[IN:%.*]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP25:%.]] = getelementptr double, double** [[IN]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP26:%.]] = getelementptr double, double** [[IN]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP27:%.]] = getelementptr double, double** [[IN]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 0			; AVX2-NEXT: [[TMP32:%.]] = getelementptr double, double** [[TMP24]], i32 0
	; AVX2-NEXT: [[TMP33:%.]] = bitcast double* [[TMP32]] to <4 x double>			; AVX2-NEXT: [[TMP33:%.]] = bitcast double* [[TMP32]] to <4 x double>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x double*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x double*> poison)
	; AVX2-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 4			; AVX2-NEXT: [[TMP34:%.]] = getelementptr double, double** [[TMP24]], i32 4
	; AVX2-NEXT: [[TMP35:%.]] = bitcast double* [[TMP34]] to <4 x double>			; AVX2-NEXT: [[TMP35:%.]] = bitcast double* [[TMP34]] to <4 x double>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x double*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x double*> poison)
	; AVX2-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 8			; AVX2-NEXT: [[TMP36:%.]] = getelementptr double, double** [[TMP24]], i32 8
	; AVX2-NEXT: [[TMP37:%.]] = bitcast double* [[TMP36]] to <4 x double>			; AVX2-NEXT: [[TMP37:%.]] = bitcast double* [[TMP36]] to <4 x double>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x double*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x double*> poison)
	; AVX2-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 12			; AVX2-NEXT: [[TMP38:%.]] = getelementptr double, double** [[TMP24]], i32 12
	; AVX2-NEXT: [[TMP39:%.]] = bitcast double* [[TMP38]] to <4 x double>			; AVX2-NEXT: [[TMP39:%.]] = bitcast double* [[TMP38]] to <4 x double>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x double*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x double> @llvm.masked.load.v4p0f64.p0v4p0f64(<4 x double> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x double*> poison)
	; AVX2-NEXT: [[TMP40:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX2-NEXT: [[TMP40:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX2-NEXT: [[TMP41:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD4]], zeroinitializer			; AVX2-NEXT: [[TMP41:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD4]], zeroinitializer
	; AVX2-NEXT: [[TMP42:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD5]], zeroinitializer			; AVX2-NEXT: [[TMP42:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD5]], zeroinitializer
	; AVX2-NEXT: [[TMP43:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD6]], zeroinitializer			; AVX2-NEXT: [[TMP43:%.]] = icmp eq <4 x double> [[WIDE_MASKED_LOAD6]], zeroinitializer
	; AVX2-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP56:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX2-NEXT: [[TMP56:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX2-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*			; AVX2-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])
	; AVX2-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP44]], i32 4			; AVX2-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP44]], i32 4
	; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*			; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])
	; AVX2-NEXT: [[TMP60:%.]] = getelementptr inbounds double, double [[TMP44]], i32 8			; AVX2-NEXT: [[TMP60:%.]] = getelementptr double, double [[TMP44]], i32 8
	; AVX2-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*			; AVX2-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])
	; AVX2-NEXT: [[TMP62:%.]] = getelementptr inbounds double, double [[TMP44]], i32 12			; AVX2-NEXT: [[TMP62:%.]] = getelementptr double, double [[TMP44]], i32 12
	; AVX2-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*			; AVX2-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])
	; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; AVX2-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX2-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX2-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP51:![0-9]+]]			; AVX2-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP51:![0-9]+]]
	; AVX2: middle.block:			; AVX2: middle.block:
	; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP16:%.*]] = and <8 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP16:%.*]] = and <8 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP17:%.*]] = and <8 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP17:%.*]] = and <8 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP18:%.*]] = and <8 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP18:%.*]] = and <8 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP19:%.*]] = and <8 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP19:%.*]] = and <8 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP20:%.*]] = icmp eq <8 x i8> [[TMP16]], zeroinitializer			; AVX512-NEXT: [[TMP20:%.*]] = icmp eq <8 x i8> [[TMP16]], zeroinitializer
	; AVX512-NEXT: [[TMP21:%.*]] = icmp eq <8 x i8> [[TMP17]], zeroinitializer			; AVX512-NEXT: [[TMP21:%.*]] = icmp eq <8 x i8> [[TMP17]], zeroinitializer
	; AVX512-NEXT: [[TMP22:%.*]] = icmp eq <8 x i8> [[TMP18]], zeroinitializer			; AVX512-NEXT: [[TMP22:%.*]] = icmp eq <8 x i8> [[TMP18]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.*]] = icmp eq <8 x i8> [[TMP19]], zeroinitializer			; AVX512-NEXT: [[TMP23:%.*]] = icmp eq <8 x i8> [[TMP19]], zeroinitializer
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds double, double** [[IN:%.*]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP24:%.]] = getelementptr double, double** [[IN:%.*]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP25:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP25:%.]] = getelementptr double, double** [[IN]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP26:%.]] = getelementptr double, double** [[IN]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP27:%.]] = getelementptr inbounds double, double** [[IN]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP27:%.]] = getelementptr double, double** [[IN]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP28:%.*]] = xor <8 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP28:%.*]] = xor <8 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP29:%.*]] = xor <8 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP29:%.*]] = xor <8 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP30:%.*]] = xor <8 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP30:%.*]] = xor <8 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP31:%.*]] = xor <8 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP31:%.*]] = xor <8 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP32:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 0			; AVX512-NEXT: [[TMP32:%.]] = getelementptr double, double** [[TMP24]], i32 0
	; AVX512-NEXT: [[TMP33:%.]] = bitcast double* [[TMP32]] to <8 x double>			; AVX512-NEXT: [[TMP33:%.]] = bitcast double* [[TMP32]] to <8 x double>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP33]], i32 8, <8 x i1> [[TMP28]], <8 x double*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP33]], i32 8, <8 x i1> [[TMP28]], <8 x double*> poison)
	; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 8			; AVX512-NEXT: [[TMP34:%.]] = getelementptr double, double** [[TMP24]], i32 8
	; AVX512-NEXT: [[TMP35:%.]] = bitcast double* [[TMP34]] to <8 x double>			; AVX512-NEXT: [[TMP35:%.]] = bitcast double* [[TMP34]] to <8 x double>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP35]], i32 8, <8 x i1> [[TMP29]], <8 x double*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP35]], i32 8, <8 x i1> [[TMP29]], <8 x double*> poison)
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 16			; AVX512-NEXT: [[TMP36:%.]] = getelementptr double, double** [[TMP24]], i32 16
	; AVX512-NEXT: [[TMP37:%.]] = bitcast double* [[TMP36]] to <8 x double>			; AVX512-NEXT: [[TMP37:%.]] = bitcast double* [[TMP36]] to <8 x double>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP37]], i32 8, <8 x i1> [[TMP30]], <8 x double*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP37]], i32 8, <8 x i1> [[TMP30]], <8 x double*> poison)
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds double, double** [[TMP24]], i32 24			; AVX512-NEXT: [[TMP38:%.]] = getelementptr double, double** [[TMP24]], i32 24
	; AVX512-NEXT: [[TMP39:%.]] = bitcast double* [[TMP38]] to <8 x double>			; AVX512-NEXT: [[TMP39:%.]] = bitcast double* [[TMP38]] to <8 x double>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP39]], i32 8, <8 x i1> [[TMP31]], <8 x double*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <8 x double> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x double> [[TMP39]], i32 8, <8 x i1> [[TMP31]], <8 x double*> poison)
	; AVX512-NEXT: [[TMP40:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX512-NEXT: [[TMP40:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX512-NEXT: [[TMP41:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD4]], zeroinitializer			; AVX512-NEXT: [[TMP41:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD4]], zeroinitializer
	; AVX512-NEXT: [[TMP42:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD5]], zeroinitializer			; AVX512-NEXT: [[TMP42:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD5]], zeroinitializer
	; AVX512-NEXT: [[TMP43:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD6]], zeroinitializer			; AVX512-NEXT: [[TMP43:%.]] = icmp eq <8 x double> [[WIDE_MASKED_LOAD6]], zeroinitializer
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP48:%.*]] = xor <8 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP48:%.*]] = xor <8 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP49:%.*]] = xor <8 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP49:%.*]] = xor <8 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP50:%.*]] = xor <8 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP50:%.*]] = xor <8 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP51:%.*]] = xor <8 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP51:%.*]] = xor <8 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP52:%.*]] = select <8 x i1> [[TMP28]], <8 x i1> [[TMP48]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP52:%.*]] = select <8 x i1> [[TMP28]], <8 x i1> [[TMP48]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP53:%.*]] = select <8 x i1> [[TMP29]], <8 x i1> [[TMP49]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP53:%.*]] = select <8 x i1> [[TMP29]], <8 x i1> [[TMP49]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP54:%.*]] = select <8 x i1> [[TMP30]], <8 x i1> [[TMP50]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP54:%.*]] = select <8 x i1> [[TMP30]], <8 x i1> [[TMP50]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP55:%.*]] = select <8 x i1> [[TMP31]], <8 x i1> [[TMP51]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP55:%.*]] = select <8 x i1> [[TMP31]], <8 x i1> [[TMP51]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP56:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX512-NEXT: [[TMP56:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX512-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <8 x double>*			; AVX512-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP57]], i32 8, <8 x i1> [[TMP52]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP57]], i32 8, <8 x i1> [[TMP52]])
	; AVX512-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP44]], i32 8			; AVX512-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP44]], i32 8
	; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*			; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP59]], i32 8, <8 x i1> [[TMP53]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP59]], i32 8, <8 x i1> [[TMP53]])
	; AVX512-NEXT: [[TMP60:%.]] = getelementptr inbounds double, double [[TMP44]], i32 16			; AVX512-NEXT: [[TMP60:%.]] = getelementptr double, double [[TMP44]], i32 16
	; AVX512-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <8 x double>*			; AVX512-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP61]], i32 8, <8 x i1> [[TMP54]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP61]], i32 8, <8 x i1> [[TMP54]])
	; AVX512-NEXT: [[TMP62:%.]] = getelementptr inbounds double, double [[TMP44]], i32 24			; AVX512-NEXT: [[TMP62:%.]] = getelementptr double, double [[TMP44]], i32 24
	; AVX512-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <8 x double>*			; AVX512-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP63]], i32 8, <8 x i1> [[TMP55]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP63]], i32 8, <8 x i1> [[TMP55]])
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX512-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX512-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX512-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP65:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP65:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>			; AVX1-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>
	; AVX1-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer			; AVX1-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer
	; AVX1-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer			; AVX1-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer
	; AVX1-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer			; AVX1-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer
	; AVX1-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer			; AVX1-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer
	; AVX1-NEXT: [[TMP24:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP24:%.]] = getelementptr i32 (), i32 ()** [[IN:%.*]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP25:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP1]]			; AVX1-NEXT: [[TMP25:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP1]]
	; AVX1-NEXT: [[TMP26:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP2]]			; AVX1-NEXT: [[TMP26:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP2]]
	; AVX1-NEXT: [[TMP27:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP3]]			; AVX1-NEXT: [[TMP27:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP3]]
	; AVX1-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP32:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 0			; AVX1-NEXT: [[TMP32:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 0
	; AVX1-NEXT: [[TMP33:%.]] = bitcast i32 ()* [[TMP32]] to <4 x i32 ()>			; AVX1-NEXT: [[TMP33:%.]] = bitcast i32 ()* [[TMP32]] to <4 x i32 ()>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x i32 ()*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x i32 ()*> poison)
	; AVX1-NEXT: [[TMP34:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 4			; AVX1-NEXT: [[TMP34:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 4
	; AVX1-NEXT: [[TMP35:%.]] = bitcast i32 ()* [[TMP34]] to <4 x i32 ()>			; AVX1-NEXT: [[TMP35:%.]] = bitcast i32 ()* [[TMP34]] to <4 x i32 ()>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x i32 ()*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x i32 ()*> poison)
	; AVX1-NEXT: [[TMP36:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 8			; AVX1-NEXT: [[TMP36:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 8
	; AVX1-NEXT: [[TMP37:%.]] = bitcast i32 ()* [[TMP36]] to <4 x i32 ()>			; AVX1-NEXT: [[TMP37:%.]] = bitcast i32 ()* [[TMP36]] to <4 x i32 ()>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x i32 ()*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x i32 ()*> poison)
	; AVX1-NEXT: [[TMP38:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 12			; AVX1-NEXT: [[TMP38:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 12
	; AVX1-NEXT: [[TMP39:%.]] = bitcast i32 ()* [[TMP38]] to <4 x i32 ()>			; AVX1-NEXT: [[TMP39:%.]] = bitcast i32 ()* [[TMP38]] to <4 x i32 ()>
	; AVX1-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x i32 ()*> poison)			; AVX1-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x i32 ()*> poison)
	; AVX1-NEXT: [[TMP40:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX1-NEXT: [[TMP40:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX1-NEXT: [[TMP41:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD4]], zeroinitializer			; AVX1-NEXT: [[TMP41:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD4]], zeroinitializer
	; AVX1-NEXT: [[TMP42:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD5]], zeroinitializer			; AVX1-NEXT: [[TMP42:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD5]], zeroinitializer
	; AVX1-NEXT: [[TMP43:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD6]], zeroinitializer			; AVX1-NEXT: [[TMP43:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD6]], zeroinitializer
	; AVX1-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[TMP0]]			; AVX1-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[TMP0]]
	; AVX1-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX1-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX1-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX1-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX1-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX1-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX1-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>			; AVX1-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>
	; AVX1-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer			; AVX1-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer
	; AVX1-NEXT: [[TMP56:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX1-NEXT: [[TMP56:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX1-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*			; AVX1-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])
	; AVX1-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP44]], i32 4			; AVX1-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP44]], i32 4
	; AVX1-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*			; AVX1-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])
	; AVX1-NEXT: [[TMP60:%.]] = getelementptr inbounds double, double [[TMP44]], i32 8			; AVX1-NEXT: [[TMP60:%.]] = getelementptr double, double [[TMP44]], i32 8
	; AVX1-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*			; AVX1-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])
	; AVX1-NEXT: [[TMP62:%.]] = getelementptr inbounds double, double [[TMP44]], i32 12			; AVX1-NEXT: [[TMP62:%.]] = getelementptr double, double [[TMP44]], i32 12
	; AVX1-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*			; AVX1-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*
	; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])			; AVX1-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])
	; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; AVX1-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; AVX1-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX1-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX1-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP44:![0-9]+]]			; AVX1-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP44:![0-9]+]]
	; AVX1: middle.block:			; AVX1: middle.block:
	; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; AVX1-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; AVX1-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP16:%.*]] = and <4 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP17:%.*]] = and <4 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP18:%.*]] = and <4 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>			; AVX2-NEXT: [[TMP19:%.*]] = and <4 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1>
	; AVX2-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer			; AVX2-NEXT: [[TMP20:%.*]] = icmp eq <4 x i8> [[TMP16]], zeroinitializer
	; AVX2-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer			; AVX2-NEXT: [[TMP21:%.*]] = icmp eq <4 x i8> [[TMP17]], zeroinitializer
	; AVX2-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer			; AVX2-NEXT: [[TMP22:%.*]] = icmp eq <4 x i8> [[TMP18]], zeroinitializer
	; AVX2-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer			; AVX2-NEXT: [[TMP23:%.*]] = icmp eq <4 x i8> [[TMP19]], zeroinitializer
	; AVX2-NEXT: [[TMP24:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP24:%.]] = getelementptr i32 (), i32 ()** [[IN:%.*]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP25:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP25:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP26:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP26:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP27:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP27:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP28:%.*]] = xor <4 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP29:%.*]] = xor <4 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP30:%.*]] = xor <4 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP31:%.*]] = xor <4 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP32:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 0			; AVX2-NEXT: [[TMP32:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 0
	; AVX2-NEXT: [[TMP33:%.]] = bitcast i32 ()* [[TMP32]] to <4 x i32 ()>			; AVX2-NEXT: [[TMP33:%.]] = bitcast i32 ()* [[TMP32]] to <4 x i32 ()>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x i32 ()*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP33]], i32 8, <4 x i1> [[TMP28]], <4 x i32 ()*> poison)
	; AVX2-NEXT: [[TMP34:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 4			; AVX2-NEXT: [[TMP34:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 4
	; AVX2-NEXT: [[TMP35:%.]] = bitcast i32 ()* [[TMP34]] to <4 x i32 ()>			; AVX2-NEXT: [[TMP35:%.]] = bitcast i32 ()* [[TMP34]] to <4 x i32 ()>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x i32 ()*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP35]], i32 8, <4 x i1> [[TMP29]], <4 x i32 ()*> poison)
	; AVX2-NEXT: [[TMP36:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 8			; AVX2-NEXT: [[TMP36:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 8
	; AVX2-NEXT: [[TMP37:%.]] = bitcast i32 ()* [[TMP36]] to <4 x i32 ()>			; AVX2-NEXT: [[TMP37:%.]] = bitcast i32 ()* [[TMP36]] to <4 x i32 ()>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x i32 ()*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP37]], i32 8, <4 x i1> [[TMP30]], <4 x i32 ()*> poison)
	; AVX2-NEXT: [[TMP38:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 12			; AVX2-NEXT: [[TMP38:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 12
	; AVX2-NEXT: [[TMP39:%.]] = bitcast i32 ()* [[TMP38]] to <4 x i32 ()>			; AVX2-NEXT: [[TMP39:%.]] = bitcast i32 ()* [[TMP38]] to <4 x i32 ()>
	; AVX2-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x i32 ()*> poison)			; AVX2-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32 ()> @llvm.masked.load.v4p0f_i32f.p0v4p0f_i32f(<4 x i32 ()> [[TMP39]], i32 8, <4 x i1> [[TMP31]], <4 x i32 ()*> poison)
	; AVX2-NEXT: [[TMP40:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX2-NEXT: [[TMP40:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX2-NEXT: [[TMP41:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD4]], zeroinitializer			; AVX2-NEXT: [[TMP41:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD4]], zeroinitializer
	; AVX2-NEXT: [[TMP42:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD5]], zeroinitializer			; AVX2-NEXT: [[TMP42:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD5]], zeroinitializer
	; AVX2-NEXT: [[TMP43:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD6]], zeroinitializer			; AVX2-NEXT: [[TMP43:%.]] = icmp eq <4 x i32 ()> [[WIDE_MASKED_LOAD6]], zeroinitializer
	; AVX2-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[TMP0]]			; AVX2-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[TMP0]]
	; AVX2-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX2-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX2-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX2-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX2-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX2-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX2-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP48:%.*]] = xor <4 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP49:%.*]] = xor <4 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP50:%.*]] = xor <4 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>			; AVX2-NEXT: [[TMP51:%.*]] = xor <4 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true>
	; AVX2-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP52:%.*]] = select <4 x i1> [[TMP28]], <4 x i1> [[TMP48]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP53:%.*]] = select <4 x i1> [[TMP29]], <4 x i1> [[TMP49]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP54:%.*]] = select <4 x i1> [[TMP30]], <4 x i1> [[TMP50]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer			; AVX2-NEXT: [[TMP55:%.*]] = select <4 x i1> [[TMP31]], <4 x i1> [[TMP51]], <4 x i1> zeroinitializer
	; AVX2-NEXT: [[TMP56:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX2-NEXT: [[TMP56:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX2-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*			; AVX2-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP57]], i32 8, <4 x i1> [[TMP52]])
	; AVX2-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP44]], i32 4			; AVX2-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP44]], i32 4
	; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*			; AVX2-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP59]], i32 8, <4 x i1> [[TMP53]])
	; AVX2-NEXT: [[TMP60:%.]] = getelementptr inbounds double, double [[TMP44]], i32 8			; AVX2-NEXT: [[TMP60:%.]] = getelementptr double, double [[TMP44]], i32 8
	; AVX2-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*			; AVX2-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP61]], i32 8, <4 x i1> [[TMP54]])
	; AVX2-NEXT: [[TMP62:%.]] = getelementptr inbounds double, double [[TMP44]], i32 12			; AVX2-NEXT: [[TMP62:%.]] = getelementptr double, double [[TMP44]], i32 12
	; AVX2-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*			; AVX2-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <4 x double>*
	; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])			; AVX2-NEXT: call void @llvm.masked.store.v4f64.p0v4f64(<4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double>* [[TMP63]], i32 8, <4 x i1> [[TMP55]])
	; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; AVX2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; AVX2-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX2-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX2-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP54:![0-9]+]]			; AVX2-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP54:![0-9]+]]
	; AVX2: middle.block:			; AVX2: middle.block:
	; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; AVX2-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; AVX2-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[TMP16:%.*]] = and <8 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP16:%.*]] = and <8 x i8> [[WIDE_LOAD]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP17:%.*]] = and <8 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP17:%.*]] = and <8 x i8> [[WIDE_LOAD1]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP18:%.*]] = and <8 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP18:%.*]] = and <8 x i8> [[WIDE_LOAD2]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP19:%.*]] = and <8 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			; AVX512-NEXT: [[TMP19:%.*]] = and <8 x i8> [[WIDE_LOAD3]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	; AVX512-NEXT: [[TMP20:%.*]] = icmp eq <8 x i8> [[TMP16]], zeroinitializer			; AVX512-NEXT: [[TMP20:%.*]] = icmp eq <8 x i8> [[TMP16]], zeroinitializer
	; AVX512-NEXT: [[TMP21:%.*]] = icmp eq <8 x i8> [[TMP17]], zeroinitializer			; AVX512-NEXT: [[TMP21:%.*]] = icmp eq <8 x i8> [[TMP17]], zeroinitializer
	; AVX512-NEXT: [[TMP22:%.*]] = icmp eq <8 x i8> [[TMP18]], zeroinitializer			; AVX512-NEXT: [[TMP22:%.*]] = icmp eq <8 x i8> [[TMP18]], zeroinitializer
	; AVX512-NEXT: [[TMP23:%.*]] = icmp eq <8 x i8> [[TMP19]], zeroinitializer			; AVX512-NEXT: [[TMP23:%.*]] = icmp eq <8 x i8> [[TMP19]], zeroinitializer
	; AVX512-NEXT: [[TMP24:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN:%.*]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP24:%.]] = getelementptr i32 (), i32 ()** [[IN:%.*]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP25:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP25:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP26:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP26:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP27:%.]] = getelementptr inbounds i32 (), i32 ()** [[IN]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP27:%.]] = getelementptr i32 (), i32 ()** [[IN]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP28:%.*]] = xor <8 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP28:%.*]] = xor <8 x i1> [[TMP20]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP29:%.*]] = xor <8 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP29:%.*]] = xor <8 x i1> [[TMP21]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP30:%.*]] = xor <8 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP30:%.*]] = xor <8 x i1> [[TMP22]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP31:%.*]] = xor <8 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP31:%.*]] = xor <8 x i1> [[TMP23]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP32:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 0			; AVX512-NEXT: [[TMP32:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 0
	; AVX512-NEXT: [[TMP33:%.]] = bitcast i32 ()* [[TMP32]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP33:%.]] = bitcast i32 ()* [[TMP32]] to <8 x i32 ()>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP33]], i32 8, <8 x i1> [[TMP28]], <8 x i32 ()*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP33]], i32 8, <8 x i1> [[TMP28]], <8 x i32 ()*> poison)
	; AVX512-NEXT: [[TMP34:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 8			; AVX512-NEXT: [[TMP34:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 8
	; AVX512-NEXT: [[TMP35:%.]] = bitcast i32 ()* [[TMP34]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP35:%.]] = bitcast i32 ()* [[TMP34]] to <8 x i32 ()>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP35]], i32 8, <8 x i1> [[TMP29]], <8 x i32 ()*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP35]], i32 8, <8 x i1> [[TMP29]], <8 x i32 ()*> poison)
	; AVX512-NEXT: [[TMP36:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 16			; AVX512-NEXT: [[TMP36:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 16
	; AVX512-NEXT: [[TMP37:%.]] = bitcast i32 ()* [[TMP36]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP37:%.]] = bitcast i32 ()* [[TMP36]] to <8 x i32 ()>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP37]], i32 8, <8 x i1> [[TMP30]], <8 x i32 ()*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP37]], i32 8, <8 x i1> [[TMP30]], <8 x i32 ()*> poison)
	; AVX512-NEXT: [[TMP38:%.]] = getelementptr inbounds i32 (), i32 ()** [[TMP24]], i32 24			; AVX512-NEXT: [[TMP38:%.]] = getelementptr i32 (), i32 ()** [[TMP24]], i32 24
	; AVX512-NEXT: [[TMP39:%.]] = bitcast i32 ()* [[TMP38]] to <8 x i32 ()>			; AVX512-NEXT: [[TMP39:%.]] = bitcast i32 ()* [[TMP38]] to <8 x i32 ()>
	; AVX512-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP39]], i32 8, <8 x i1> [[TMP31]], <8 x i32 ()*> poison)			; AVX512-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <8 x i32 ()> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8 x i32 ()> [[TMP39]], i32 8, <8 x i1> [[TMP31]], <8 x i32 ()*> poison)
	; AVX512-NEXT: [[TMP40:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer			; AVX512-NEXT: [[TMP40:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD]], zeroinitializer
	; AVX512-NEXT: [[TMP41:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD4]], zeroinitializer			; AVX512-NEXT: [[TMP41:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD4]], zeroinitializer
	; AVX512-NEXT: [[TMP42:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD5]], zeroinitializer			; AVX512-NEXT: [[TMP42:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD5]], zeroinitializer
	; AVX512-NEXT: [[TMP43:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD6]], zeroinitializer			; AVX512-NEXT: [[TMP43:%.]] = icmp eq <8 x i32 ()> [[WIDE_MASKED_LOAD6]], zeroinitializer
	; AVX512-NEXT: [[TMP44:%.]] = getelementptr inbounds double, double [[OUT:%.*]], i64 [[TMP0]]			; AVX512-NEXT: [[TMP44:%.]] = getelementptr double, double [[OUT:%.*]], i64 [[TMP0]]
	; AVX512-NEXT: [[TMP45:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP1]]			; AVX512-NEXT: [[TMP45:%.]] = getelementptr double, double [[OUT]], i64 [[TMP1]]
	; AVX512-NEXT: [[TMP46:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP2]]			; AVX512-NEXT: [[TMP46:%.]] = getelementptr double, double [[OUT]], i64 [[TMP2]]
	; AVX512-NEXT: [[TMP47:%.]] = getelementptr inbounds double, double [[OUT]], i64 [[TMP3]]			; AVX512-NEXT: [[TMP47:%.]] = getelementptr double, double [[OUT]], i64 [[TMP3]]
	; AVX512-NEXT: [[TMP48:%.*]] = xor <8 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP48:%.*]] = xor <8 x i1> [[TMP40]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP49:%.*]] = xor <8 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP49:%.*]] = xor <8 x i1> [[TMP41]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP50:%.*]] = xor <8 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP50:%.*]] = xor <8 x i1> [[TMP42]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP51:%.*]] = xor <8 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>			; AVX512-NEXT: [[TMP51:%.*]] = xor <8 x i1> [[TMP43]], <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>
	; AVX512-NEXT: [[TMP52:%.*]] = select <8 x i1> [[TMP28]], <8 x i1> [[TMP48]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP52:%.*]] = select <8 x i1> [[TMP28]], <8 x i1> [[TMP48]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP53:%.*]] = select <8 x i1> [[TMP29]], <8 x i1> [[TMP49]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP53:%.*]] = select <8 x i1> [[TMP29]], <8 x i1> [[TMP49]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP54:%.*]] = select <8 x i1> [[TMP30]], <8 x i1> [[TMP50]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP54:%.*]] = select <8 x i1> [[TMP30]], <8 x i1> [[TMP50]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP55:%.*]] = select <8 x i1> [[TMP31]], <8 x i1> [[TMP51]], <8 x i1> zeroinitializer			; AVX512-NEXT: [[TMP55:%.*]] = select <8 x i1> [[TMP31]], <8 x i1> [[TMP51]], <8 x i1> zeroinitializer
	; AVX512-NEXT: [[TMP56:%.]] = getelementptr inbounds double, double [[TMP44]], i32 0			; AVX512-NEXT: [[TMP56:%.]] = getelementptr double, double [[TMP44]], i32 0
	; AVX512-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <8 x double>*			; AVX512-NEXT: [[TMP57:%.]] = bitcast double [[TMP56]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP57]], i32 8, <8 x i1> [[TMP52]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP57]], i32 8, <8 x i1> [[TMP52]])
	; AVX512-NEXT: [[TMP58:%.]] = getelementptr inbounds double, double [[TMP44]], i32 8			; AVX512-NEXT: [[TMP58:%.]] = getelementptr double, double [[TMP44]], i32 8
	; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*			; AVX512-NEXT: [[TMP59:%.]] = bitcast double [[TMP58]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP59]], i32 8, <8 x i1> [[TMP53]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP59]], i32 8, <8 x i1> [[TMP53]])
	; AVX512-NEXT: [[TMP60:%.]] = getelementptr inbounds double, double [[TMP44]], i32 16			; AVX512-NEXT: [[TMP60:%.]] = getelementptr double, double [[TMP44]], i32 16
	; AVX512-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <8 x double>*			; AVX512-NEXT: [[TMP61:%.]] = bitcast double [[TMP60]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP61]], i32 8, <8 x i1> [[TMP54]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP61]], i32 8, <8 x i1> [[TMP54]])
	; AVX512-NEXT: [[TMP62:%.]] = getelementptr inbounds double, double [[TMP44]], i32 24			; AVX512-NEXT: [[TMP62:%.]] = getelementptr double, double [[TMP44]], i32 24
	; AVX512-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <8 x double>*			; AVX512-NEXT: [[TMP63:%.]] = bitcast double [[TMP62]] to <8 x double>*
	; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP63]], i32 8, <8 x i1> [[TMP55]])			; AVX512-NEXT: call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <8 x double>* [[TMP63]], i32 8, <8 x i1> [[TMP55]])
	; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32			; AVX512-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
	; AVX512-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; AVX512-NEXT: [[TMP64:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; AVX512-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP67:![0-9]+]]			; AVX512-NEXT: br i1 [[TMP64]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP67:![0-9]+]]
	; AVX512: middle.block:			; AVX512: middle.block:
	; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]			; AVX512-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[WIDE_TRIP_COUNT]], [[N_VEC]]
	; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; AVX512-NEXT: br i1 [[CMP_N]], label [[FOR_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED: pred.load.if13:			; DISABLED_MASKED_STRIDED: pred.load.if13:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP45:%.*]] = extractelement <8 x i32> [[TMP1]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP45:%.*]] = extractelement <8 x i32> [[TMP1]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP46:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP45]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP46:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP45]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.]] = load i8, i8 [[TMP46]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.]] = load i8, i8 [[TMP46]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.*]] = insertelement <8 x i8> [[TMP43]], i8 [[TMP47]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.*]] = insertelement <8 x i8> [[TMP43]], i8 [[TMP47]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE14]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE14]]
	; DISABLED_MASKED_STRIDED: pred.load.continue14:			; DISABLED_MASKED_STRIDED: pred.load.continue14:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.*]] = phi <8 x i8> [ [[TMP43]], [[PRED_LOAD_CONTINUE12]] ], [ [[TMP48]], [[PRED_LOAD_IF13]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.*]] = phi <8 x i8> [ [[TMP43]], [[PRED_LOAD_CONTINUE12]] ], [ [[TMP48]], [[PRED_LOAD_IF13]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.]] = bitcast i8 [[TMP50]] to <8 x i8>*			; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.]] = bitcast i8 [[TMP50]] to <8 x i8>*
	; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP49]], <8 x i8>* [[TMP51]], i32 1, <8 x i1> [[TMP0]])			; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP49]], <8 x i8>* [[TMP51]], i32 1, <8 x i1> [[TMP0]])
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP52]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP52]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @masked_strided1(			; ENABLED_MASKED_STRIDED-LABEL: @masked_strided1(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <8 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP4]] to <8 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP5]], i32 1, <8 x i1> [[TMP0]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP5]], i32 1, <8 x i1> [[TMP0]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1016			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1016
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP6]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP6]], label [[FOR_BODY:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.body:			; ENABLED_MASKED_STRIDED: for.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[IX_09:%.]] = phi i32 [ [[INC:%.]], [[FOR_INC:%.*]] ], [ 1016, [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[IX_09:%.]] = phi i32 [ [[INC:%.]], [[FOR_INC:%.*]] ], [ 1016, [[VECTOR_BODY]] ]
	▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED: pred.load.if13:			; DISABLED_MASKED_STRIDED: pred.load.if13:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP45:%.*]] = extractelement <8 x i32> [[TMP1]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP45:%.*]] = extractelement <8 x i32> [[TMP1]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP46:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP45]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP46:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP45]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.]] = load i8, i8 [[TMP46]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.]] = load i8, i8 [[TMP46]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.*]] = insertelement <8 x i8> [[TMP43]], i8 [[TMP47]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.*]] = insertelement <8 x i8> [[TMP43]], i8 [[TMP47]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE14]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE14]]
	; DISABLED_MASKED_STRIDED: pred.load.continue14:			; DISABLED_MASKED_STRIDED: pred.load.continue14:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.*]] = phi <8 x i8> [ [[TMP43]], [[PRED_LOAD_CONTINUE12]] ], [ [[TMP48]], [[PRED_LOAD_IF13]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.*]] = phi <8 x i8> [ [[TMP43]], [[PRED_LOAD_CONTINUE12]] ], [ [[TMP48]], [[PRED_LOAD_IF13]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.]] = bitcast i8 [[TMP50]] to <8 x i8>*			; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.]] = bitcast i8 [[TMP50]] to <8 x i8>*
	; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP49]], <8 x i8>* [[TMP51]], i32 1, <8 x i1> [[TMP0]])			; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP49]], <8 x i8>* [[TMP51]], i32 1, <8 x i1> [[TMP0]])
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP52]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP52]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	;			;
	; ENABLED_MASKED_STRIDED-LABEL: @masked_strided1_optsize(			; ENABLED_MASKED_STRIDED-LABEL: @masked_strided1_optsize(
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[TMP4]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[TMP4]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.]] = bitcast i8 [[TMP5]] to <8 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.]] = bitcast i8 [[TMP5]] to <8 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP6]], i32 1, <8 x i1> [[TMP0]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP6]], i32 1, <8 x i1> [[TMP0]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	; ENABLED_MASKED_STRIDED-NEXT: ret void			; ENABLED_MASKED_STRIDED-NEXT: ret void
	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED: pred.load.if15:			; DISABLED_MASKED_STRIDED: pred.load.if15:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.*]] = extractelement <8 x i32> [[TMP2]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.*]] = extractelement <8 x i32> [[TMP2]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP47]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP47]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.]] = load i8, i8 [[TMP48]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.]] = load i8, i8 [[TMP48]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.*]] = insertelement <8 x i8> [[TMP45]], i8 [[TMP49]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.*]] = insertelement <8 x i8> [[TMP45]], i8 [[TMP49]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE16]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE16]]
	; DISABLED_MASKED_STRIDED: pred.load.continue16:			; DISABLED_MASKED_STRIDED: pred.load.continue16:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.*]] = phi <8 x i8> [ [[TMP45]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP50]], [[PRED_LOAD_IF15]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.*]] = phi <8 x i8> [ [[TMP45]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP50]], [[PRED_LOAD_IF15]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP53:%.]] = bitcast i8 [[TMP52]] to <8 x i8>*			; DISABLED_MASKED_STRIDED-NEXT: [[TMP53:%.]] = bitcast i8 [[TMP52]] to <8 x i8>*
	; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP51]], <8 x i8>* [[TMP53]], i32 1, <8 x i1> [[TMP3]])			; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP51]], <8 x i8>* [[TMP53]], i32 1, <8 x i1> [[TMP3]])
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP54:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP54:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP54]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP54]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	Show All 12 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr i8, i8 [[P:%.*]], i32 [[TMP2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = and <16 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false, i1 true, i1 false>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP5]], i32 1, <16 x i1> [[TMP6]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP5]], i32 1, <16 x i1> [[TMP6]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <8 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <8 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP8]], i32 1, <8 x i1> [[TMP4]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP8]], i32 1, <8 x i1> [[TMP4]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	; ENABLED_MASKED_STRIDED-NEXT: ret void			; ENABLED_MASKED_STRIDED-NEXT: ret void
	▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	; DISABLED_MASKED_STRIDED: pred.load.if15:			; DISABLED_MASKED_STRIDED: pred.load.if15:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.*]] = extractelement <8 x i32> [[TMP2]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP47:%.*]] = extractelement <8 x i32> [[TMP2]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP47]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP48:%.]] = getelementptr inbounds i8, i8 [[P]], i32 [[TMP47]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.]] = load i8, i8 [[TMP48]], align 1			; DISABLED_MASKED_STRIDED-NEXT: [[TMP49:%.]] = load i8, i8 [[TMP48]], align 1
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.*]] = insertelement <8 x i8> [[TMP45]], i8 [[TMP49]], i32 7			; DISABLED_MASKED_STRIDED-NEXT: [[TMP50:%.*]] = insertelement <8 x i8> [[TMP45]], i8 [[TMP49]], i32 7
	; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE16]]			; DISABLED_MASKED_STRIDED-NEXT: br label [[PRED_LOAD_CONTINUE16]]
	; DISABLED_MASKED_STRIDED: pred.load.continue16:			; DISABLED_MASKED_STRIDED: pred.load.continue16:
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.*]] = phi <8 x i8> [ [[TMP45]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP50]], [[PRED_LOAD_IF15]] ]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP51:%.*]] = phi <8 x i8> [ [[TMP45]], [[PRED_LOAD_CONTINUE14]] ], [ [[TMP50]], [[PRED_LOAD_IF15]] ]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP52:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP53:%.]] = bitcast i8 [[TMP52]] to <8 x i8>*			; DISABLED_MASKED_STRIDED-NEXT: [[TMP53:%.]] = bitcast i8 [[TMP52]] to <8 x i8>*
	; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP51]], <8 x i8>* [[TMP53]], i32 1, <8 x i1> [[TMP3]])			; DISABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[TMP51]], <8 x i8>* [[TMP53]], i32 1, <8 x i1> [[TMP3]])
	; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; DISABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; DISABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; DISABLED_MASKED_STRIDED-NEXT: [[TMP54:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; DISABLED_MASKED_STRIDED-NEXT: [[TMP54:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP54]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; DISABLED_MASKED_STRIDED-NEXT: br i1 [[TMP54]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; DISABLED_MASKED_STRIDED: for.end:			; DISABLED_MASKED_STRIDED: for.end:
	; DISABLED_MASKED_STRIDED-NEXT: ret void			; DISABLED_MASKED_STRIDED-NEXT: ret void
	Show All 12 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = mul nsw i32 [[INDEX]], 3			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = mul i32 [[INDEX]], 3
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr i8, i8 [[P:%.*]], i32 [[TMP2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <24 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <24 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <24 x i32> <i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 4, i32 4, i32 4, i32 5, i32 5, i32 5, i32 6, i32 6, i32 6, i32 7, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <24 x i32> <i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3, i32 4, i32 4, i32 4, i32 5, i32 5, i32 5, i32 6, i32 6, i32 6, i32 7, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = and <24 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false>			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = and <24 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <24 x i8> @llvm.masked.load.v24i8.p0v24i8(<24 x i8> [[TMP5]], i32 1, <24 x i1> [[TMP6]], <24 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <24 x i8> @llvm.masked.load.v24i8.p0v24i8(<24 x i8> [[TMP5]], i32 1, <24 x i1> [[TMP6]], <24 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <24 x i8> [[WIDE_MASKED_VEC]], <24 x i8> poison, <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <24 x i8> [[WIDE_MASKED_VEC]], <24 x i8> poison, <8 x i32> <i32 0, i32 3, i32 6, i32 9, i32 12, i32 15, i32 18, i32 21>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 [[INDEX]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 [[INDEX]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <8 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = bitcast i8 [[TMP7]] to <8 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP8]], i32 1, <8 x i1> [[TMP4]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> [[STRIDED_VEC]], <8 x i8>* [[TMP8]], i32 1, <8 x i1> [[TMP4]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP9]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	; ENABLED_MASKED_STRIDED-NEXT: ret void			; ENABLED_MASKED_STRIDED-NEXT: ret void
	▲ Show 20 Lines • Show All 682 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32			; ENABLED_MASKED_STRIDED-NEXT: [[CONV:%.]] = zext i8 [[GUARD:%.]] to i32
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i32> poison, i32 [[CONV]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[ENTRY]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp ugt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = shl i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.]] = getelementptr i8, i8 [[P:%.*]], i32 [[TMP1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = bitcast i8 [[TMP2]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP0]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP3]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC1:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = or i32 [[TMP1]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC1]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = select <8 x i1> [[TMP5]], <8 x i8> [[STRIDED_VEC1]], <8 x i8> [[STRIDED_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = sub <8 x i8> zeroinitializer, [[TMP6]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 -1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 -1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr inbounds i8, i8 [[TMP8]], i32 [[TMP4]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.]] = getelementptr i8, i8 [[TMP8]], i32 [[TMP4]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = bitcast i8 [[TMP9]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = bitcast i8 [[TMP9]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP6]], <8 x i8> [[TMP7]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP10]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024			; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.*]] = icmp eq i32 [[INDEX_NEXT]], 1024
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP11]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	▲ Show 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <8 x i32> poison, i32 [[GUARD:%.]], i32 0			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <8 x i32> poison, i32 [[GUARD:%.]], i32 0
	; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <8 x i32> [[BROADCAST_SPLATINSERT1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i32 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND:%.]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp sgt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.*]] = icmp sgt <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.*]] = icmp ule <8 x i32> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[INDEX]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = shl i32 [[INDEX]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr inbounds i8, i8 [[P:%.*]], i32 [[TMP2]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.]] = getelementptr i8, i8 [[P:%.*]], i32 [[TMP2]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.*]] = select <8 x i1> [[TMP1]], <8 x i1> [[TMP0]], <8 x i1> zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i8 [[TMP3]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> poison, <16 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3, i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_MASKED_VEC:%.]] = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8> [[TMP5]], i32 1, <16 x i1> [[INTERLEAVED_MASK]], <16 x i8> poison)
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[STRIDED_VEC3:%.*]] = shufflevector <16 x i8> [[WIDE_MASKED_VEC]], <16 x i8> poison, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i32 [[TMP2]], 1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = or i32 [[TMP2]], 1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp slt <8 x i8> [[STRIDED_VEC]], [[STRIDED_VEC3]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP8:%.*]] = select <8 x i1> [[TMP7]], <8 x i8> [[STRIDED_VEC3]], <8 x i8> [[STRIDED_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = sub <8 x i8> zeroinitializer, [[TMP8]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP9:%.*]] = sub <8 x i8> zeroinitializer, [[TMP8]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = getelementptr inbounds i8, i8 [[Q:%.*]], i32 -1			; ENABLED_MASKED_STRIDED-NEXT: [[TMP10:%.]] = getelementptr i8, i8 [[Q:%.*]], i32 -1
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.]] = getelementptr inbounds i8, i8 [[TMP10]], i32 [[TMP6]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP11:%.]] = getelementptr i8, i8 [[TMP10]], i32 [[TMP6]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP12:%.]] = bitcast i8 [[TMP11]] to <16 x i8>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP12:%.]] = bitcast i8 [[TMP11]] to <16 x i8>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> [[TMP9]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <8 x i8> [[TMP8]], <8 x i8> [[TMP9]], <16 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11, i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP12]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> [[INTERLEAVED_VEC]], <16 x i8>* [[TMP12]], i32 1, <16 x i1> [[INTERLEAVED_MASK]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add i32 [[INDEX]], 8
	; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>			; ENABLED_MASKED_STRIDED-NEXT: [[VEC_IND_NEXT]] = add <8 x i32> [[VEC_IND]], <i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP13:%.*]] = icmp eq i32 [[INDEX_NEXT]], [[N_VEC]]
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP13]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP13]], label [[FOR_END]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; ENABLED_MASKED_STRIDED: for.end:			; ENABLED_MASKED_STRIDED: for.end:
	▲ Show 20 Lines • Show All 472 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

	Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines
	; ENABLED_MASKED_STRIDED-NEXT: entry:			; ENABLED_MASKED_STRIDED-NEXT: entry:
	; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]			; ENABLED_MASKED_STRIDED-NEXT: br label [[VECTOR_BODY:%.*]]
	; ENABLED_MASKED_STRIDED: vector.body:			; ENABLED_MASKED_STRIDED: vector.body:
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[INDEX]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP0:%.]] = getelementptr inbounds i16, i16 [[X:%.*]], i64 [[INDEX]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.]] = bitcast i16 [[TMP0]] to <4 x i16>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP1:%.]] = bitcast i16 [[TMP0]] to <4 x i16>*
	; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 2			; ENABLED_MASKED_STRIDED-NEXT: [[WIDE_LOAD:%.]] = load <4 x i16>, <4 x i16> [[TMP1]], align 2
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i16> [[WIDE_LOAD]], zeroinitializer			; ENABLED_MASKED_STRIDED-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i16> [[WIDE_LOAD]], zeroinitializer
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = mul nuw nsw i64 [[INDEX]], 3			; ENABLED_MASKED_STRIDED-NEXT: [[TMP3:%.*]] = mul i64 [[INDEX]], 3
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr inbounds i16, i16 [[POINTS:%.*]], i64 [[TMP3]]			; ENABLED_MASKED_STRIDED-NEXT: [[TMP4:%.]] = getelementptr i16, i16 [[POINTS:%.*]], i64 [[TMP3]]
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <12 x i16>*			; ENABLED_MASKED_STRIDED-NEXT: [[TMP5:%.]] = bitcast i16 [[TMP4]] to <12 x i16>*
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> poison, <12 x i32> <i32 0, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 2, i32 undef, i32 undef, i32 3, i32 undef, i32 undef>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_VEC:%.*]] = shufflevector <4 x i16> [[WIDE_LOAD]], <4 x i16> poison, <12 x i32> <i32 0, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 2, i32 undef, i32 undef, i32 3, i32 undef, i32 undef>
	; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP2]], <4 x i1> poison, <12 x i32> <i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3>			; ENABLED_MASKED_STRIDED-NEXT: [[INTERLEAVED_MASK:%.*]] = shufflevector <4 x i1> [[TMP2]], <4 x i1> poison, <12 x i32> <i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 2, i32 2, i32 2, i32 3, i32 3, i32 3>
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = and <12 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false>			; ENABLED_MASKED_STRIDED-NEXT: [[TMP6:%.*]] = and <12 x i1> [[INTERLEAVED_MASK]], <i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false, i1 true, i1 false, i1 false>
	; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v12i16.p0v12i16(<12 x i16> [[INTERLEAVED_VEC]], <12 x i16>* [[TMP5]], i32 2, <12 x i1> [[TMP6]])			; ENABLED_MASKED_STRIDED-NEXT: call void @llvm.masked.store.v12i16.p0v12i16(<12 x i16> [[INTERLEAVED_VEC]], <12 x i16>* [[TMP5]], i32 2, <12 x i1> [[TMP6]])
	; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; ENABLED_MASKED_STRIDED-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; ENABLED_MASKED_STRIDED-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]			; ENABLED_MASKED_STRIDED-NEXT: br i1 [[TMP7]], label [[FOR_END:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
	Show All 27 Lines

llvm/test/Transforms/LoopVectorize/X86/x86-pr39099.ll

	; RUN: opt -mcpu=skx -S -loop-vectorize -force-vector-width=8 -force-vector-interleave=1 -enable-interleaved-mem-accesses < %s \| FileCheck %s			; RUN: opt -mcpu=skx -S -loop-vectorize -force-vector-width=8 -force-vector-interleave=1 -enable-interleaved-mem-accesses < %s \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"

	; This test checks the fix for PR39099.			; This test checks the fix for PR39099.
	;			;
	; Check that the predicated load is not vectorized as an			; Check that the predicated load is not vectorized as an
	; interleaved-group (which requires proper masking, currently unsupported)			; interleaved-group (which requires proper masking, currently unsupported)
	; but rather as a scalarized accesses.			; but rather as a scalarized accesses.
	; (For SKX, Gather is not supported by the compiler for chars, therefore			; (For SKX, Gather is not supported by the compiler for chars, therefore
	; the only remaining alternative is to scalarize).			; the only remaining alternative is to scalarize).
	;			;
	; void masked_strided(const unsigned char* restrict p,			; void masked_strided(const unsigned char* restrict p,
	; unsigned char* restrict q,			; unsigned char* restrict q,
	; unsigned char guard) {			; unsigned char guard) {
	; for(ix=0; ix < 1024; ++ix) {			; for(ix=0; ix < 1024; ++ix) {
	; if (ix > guard) {			; if (ix > guard) {
	; char t = p[2*ix];			; char t = p[2*ix];
	; q[ix] = t;			; q[ix] = t;
	; }			; }
	; }			; }
	; }			; }

	;CHECK-LABEL: @masked_strided(			;CHECK-LABEL: @masked_strided(
	;CHECK: vector.body:			;CHECK: vector.body:
	;CHECK-NEXT: %index = phi i32			;CHECK-NEXT: %index = phi i32
	;CHECK-NEXT: %[[VECIND:.+]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			;CHECK-NEXT: %[[VECIND:.+]] = phi <8 x i32> [ <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	;CHECK-NEXT: %[[VMASK:.+]] = icmp ugt <8 x i32> %[[VECIND]], %{{broadcast.splat*}}			;CHECK-NEXT: %[[VMASK:.+]] = icmp ugt <8 x i32> %[[VECIND]], %{{broadcast.splat*}}
	;CHECK-NEXT: %{{.*}} = shl nuw nsw <8 x i32> %[[VECIND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			;CHECK-NEXT: %{{.*}} = shl nuw nsw <8 x i32> %[[VECIND]], <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	;CHECK-NEXT: %[[M:.+]] = extractelement <8 x i1> %[[VMASK]], i32 0			;CHECK-NEXT: %[[M:.+]] = extractelement <8 x i1> %[[VMASK]], i32 0
	;CHECK-NEXT: br i1 %[[M]], label %pred.store.if, label %pred.store.continue			;CHECK-NEXT: br i1 %[[M]], label %pred.store.if, label %pred.store.continue
	;CHECK-NOT: %{{.+}} = load <16 x i8>, <16 x i8>* %{{.*}}, align 1			;CHECK-NOT: %{{.+}} = load <16 x i8>, <16 x i8>* %{{.*}}, align 1

	define dso_local void @masked_strided(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 zeroext %guard) local_unnamed_addr {			define dso_local void @masked_strided(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 zeroext %guard) local_unnamed_addr {
	Show All 25 Lines

llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1			; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDEX]] to i16			; CHECK-NEXT: [[TMP2:%.*]] = trunc i64 [[INDEX]] to i16
	; CHECK-NEXT: [[TMP3:%.*]] = add i16 [[TMP2]], 0			; CHECK-NEXT: [[TMP3:%.*]] = add i16 [[TMP2]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt <2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt <2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [32 x i16], [32 x i16] @src, i16 0, i16 [[TMP3]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr [32 x i16], [32 x i16] @src, i16 0, i16 [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i16, i16 [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.]] = getelementptr i16, i16 [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP6]] to <2 x i16>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i16 [[TMP6]] to <2 x i16>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i16>, <2 x i16> [[TMP7]], align 1			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <2 x i16>, <2 x i16> [[TMP7]], align 1
	; CHECK-NEXT: [[TMP8:%.*]] = icmp sgt <2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp sgt <2 x i64> [[VEC_IND]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP9:%.*]] = xor <2 x i1> [[TMP4]], <i1 true, i1 true>			; CHECK-NEXT: [[TMP9:%.*]] = xor <2 x i1> [[TMP4]], <i1 true, i1 true>
	; CHECK-NEXT: [[TMP10:%.*]] = xor <2 x i1> [[TMP8]], <i1 true, i1 true>			; CHECK-NEXT: [[TMP10:%.*]] = xor <2 x i1> [[TMP8]], <i1 true, i1 true>
	; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP4]], <2 x i1> [[TMP10]], <2 x i1> zeroinitializer			; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP4]], <2 x i1> [[TMP10]], <2 x i1> zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP4]], <2 x i1> [[TMP8]], <2 x i1> zeroinitializer			; CHECK-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP4]], <2 x i1> [[TMP8]], <2 x i1> zeroinitializer
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP11]], <2 x i16> [[WIDE_LOAD]], <2 x i16> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <2 x i1> [[TMP11]], <2 x i16> [[WIDE_LOAD]], <2 x i16> zeroinitializer
	▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[C]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[TMP2]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !alias.scope !8			; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4, !alias.scope !8
	; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 20, i32 20, i32 20, i32 20>			; CHECK-NEXT: [[TMP4:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], <i32 20, i32 20, i32 20, i32 20>
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast float [[TMP5]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4, !alias.scope !11			; CHECK-NEXT: [[WIDE_LOAD14:%.]] = load <4 x float>, <4 x float> [[TMP6]], align 4, !alias.scope !11
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[WIDE_LOAD14]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[WIDE_LOAD14]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[B]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr float, float [[B]], i64 [[INDEX]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4, !alias.scope !13, !noalias !15			; CHECK-NEXT: [[WIDE_LOAD15:%.]] = load <4 x float>, <4 x float> [[TMP9]], align 4, !alias.scope !13, !noalias !15
	; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> [[TMP7]], [[WIDE_LOAD15]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x float> [[TMP7]], [[WIDE_LOAD15]]
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x float> [[TMP7]], <4 x float> [[TMP10]]			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP4]], <4 x float> [[TMP7]], <4 x float> [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP8]] to <4 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[TMP8]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[PREDPHI]], <4 x float>* [[TMP11]], align 4, !alias.scope !13, !noalias !15			; CHECK-NEXT: store <4 x float> [[PREDPHI]], <4 x float>* [[TMP11]], align 4, !alias.scope !13, !noalias !15
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 10000
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Drop integer poison-generating flags from instructions that need predicationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 388842

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/test/Transforms/LoopVectorize/AArch64/sve-masked-loadstore.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-vector-reverse-mask4.ll

llvm/test/Transforms/LoopVectorize/AArch64/vector-reverse-mask4.ll

llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll

llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll

llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

llvm/test/Transforms/LoopVectorize/X86/x86-pr39099.ll

llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll

llvm/test/Transforms/PhaseOrdering/AArch64/hoisting-sinking-required-for-vectorization.ll

[LV] Drop integer poison-generating flags from instructions that need predication
ClosedPublic