This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
15/17
InstructionSimplify.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
1/2
rem-mul-shl.ll
-
LoopVectorize/AArch64/
-
AArch64/
-
sve-interleaved-accesses.ll
-
sve-widen-phi.ll

Differential D154953

[InstSimplify] Remove the remainder loop if we know the mask is always true
ClosedPublic

Authored by Allen on Jul 11 2023, 5:13 AM.

Download Raw Diff

Details

Reviewers

david-arm
dmgreen
sdesmalen
nikic
craig.topper
v01dXYZ
paulwalker-arm
goldstein.w.n

Commits

rG497966f7f2bb: Reland [InstSimplify] Remove the remainder loop if we know the mask is always…
rG3e386b227886: [InstSimplify] Remove the remainder loop if we know the mask is always true

Summary

We check the loop trip count is known a power of 2 to determine
whether the tail loop can be eliminated in D146199.
However, the remainder loop of mask scalable loop can also be removed
If we know the mask is always going to be true for every vector iteration.
Depend on the assume of power-of-two vscale on D155350

proofs： https://alive2.llvm.org/ce/z/bT62Wa

Fix https://github.com/llvm/llvm-project/issues/63616.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

nikic requested changes to this revision.Jul 11 2023, 6:45 AM

nikic added a subscriber: nikic.

nikic added inline comments.

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
492 ↗	(On Diff #539044)	This needs to be either a data layout property or a function attribute. Though it would probably best to change LangRef to require that vscale is always a power of two -- I think consensus has shifted towards non-pow2 vscales not being necessary.

This revision now requires changes to proceed.Jul 11 2023, 6:45 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 11 2023, 6:45 AM

address comments

Added new case in file llvm/test/Transforms/InstCombine/AArch64/rem-mul-shl.ll because it need a option -mtriple=aarch64-none-linux-gnu

llvm/include/llvm/Transforms/InstCombine/InstCombiner.h
492 ↗	(On Diff #539044)	according https://reviews.llvm.org/D141486, the document already clarify that the VScale will be known to be a power of 2
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1900 ↗	(On Diff #539029)	yes, apply your comment, thanks
1901 ↗	(On Diff #539029)	Apply your comment, thanks
1903 ↗	(On Diff #539029)	Yes, you are right,thanks

(Removing from review queue per previous comment -- you need to expose the pow2 property in a TTI-independent manner via one of LangRef, DataLayout or function attribute.)

This revision now requires changes to proceed.Jul 11 2023, 7:40 AM

In D154953#4489494, @nikic wrote:

(Removing from review queue per previous comment -- you need to expose the pow2 property in a TTI-independent manner via one of LangRef, DataLayout or function attribute.)

Thanks for your comment, if I understand correctly, do you mean I need add a new function attribute vscale_pow2 for example?

@nikic What's the rational for not being allow to use TTI during instcombine? TTI is used for target specific combines.

I don't like the idea of having to decorate every function with information that is essentially constant for the target. Changing the LangRef seems like a backward step given LLVM already supports non-power-of-two values of vscale, which will be much harder to re-add once lost. That said, if no target supports non-power-of-two values of vscale then I'll not fight to keep such support if that's the consensus.

As a halfway house, what if we changed the definition of vscale_range to imply vscale is power-of-two. Sure that's still a loss of functionality but it's smaller and critically maintains the path for supporting arbitrary vscale values whilst ensuring existing targets can encode the power-of-two-ness within the IR without needing any code changes.

Harbormaster completed remote builds in B244463: Diff 539082.Jul 11 2023, 10:29 AM

In D154953#4489861, @paulwalker-arm wrote:

@nikic What's the rational for not being allow to use TTI during instcombine? TTI is used for target specific combines.

The general rationale is that it's a target-independent canonicalization pass. This isn't really relevant for this particular case (because the query is about legality rather than profitability).

The issue that is more relevant here is layering. The proper way to perform this fold is to teach isKnownToBeAPowerOfTwo() from ValueTracking about vscale. If you do that, you most likely don't even need any special code in InstCombine. However, we definitely don't want a TTI dependency in ValueTracking (which is used from literally everywhere, so we'd either have TTI dependencies everywhere, or get reduced functionality in most places). The information needs to be available in a target-independent way.

I don't like the idea of having to decorate every function with information that is essentially constant for the target.

DataLayout would provide a way to avoid decorating individual functions.

Changing the LangRef seems like a backward step given LLVM already supports non-power-of-two values of vscale, which will be much harder to re-add once lost. That said, if no target supports non-power-of-two values of vscale then I'll not fight to keep such support if that's the consensus.

We shouldn't keep dead code around due to sunk costs. The dead code has ongoing costs (like, we wouldn't even have this conversation without it). Of course, is there are (current or foreseeable future) targets that have non-pow2 vscale then we should certainly keep it, but if not, then imho we should get rid of this at the root.

As a halfway house, what if we changed the definition of vscale_range to imply vscale is power-of-two. Sure that's still a loss of functionality but it's smaller and critically maintains the path for supporting arbitrary vscale values whilst ensuring existing targets can encode the power-of-two-ness within the IR without needing any code changes.

Sounds reasonable to me.

So now can I just delete the checking TTI.isVScaleKnownToBeAPowerOfTwo() with above halfway house, which is assume the m_VScale() is power-of-two values for all targets ? or use a DataLayout here.

Allen mentioned this in D154314: [LV] Remove the reminder loop if we know the mask is always true.Jul 12 2023, 3:59 AM

In D154953#4492037, @Allen wrote:

So now can I just delete the checking TTI.isVScaleKnownToBeAPowerOfTwo() with above halfway house, which is assume the m_VScale() is power-of-two values for all targets ? or use a DataLayout here.

To implement my suggestion you'll need to update the LangRef to document that vscale_range implies vscale is a power-of-two. This is best done as a separate patch because you'll also need to updating the parsing of the attribute to reject values that are not a power-of-two. This patch should include the RISC-V folk so we've go agreement between the current targets that support scalable vectors.

At that point you should be able to implement the combine (or update ValueTracking) using purely information within the IR without any uses of TTI.

I'm curious how such an optimisation is related to other passes such as ValueTracking or ScalarEvolution. Indeed, if we can use information about the op0 such as the LSBs or as a ScalarEvolutionValue, we could support non constant value.

Thanks @v01dXYZ and @paulwalker-arm, I'll I'm going to reimplement it based on ValueTracking.

nit: the title says reminder not remainder

Allen updated this revision to Diff 539860.Jul 12 2023, 11:28 PM

Allen retitled this revision from [InstCombine] Remove the reminder loop if we know the mask is always true to [InstCombine] Remove the remainder loop if we know the mask is always true.

Allen edited the summary of this revision. (Show Details)

Allen added reviewers: craig.topper, v01dXYZ, paulwalker-arm.

Harbormaster completed remote builds in B245000: Diff 539860.Jul 13 2023, 1:44 AM

I think this is a really useful patch @Allen - thank you! It's definitely a step in the right direction. I do have a few comments on the patch ...

llvm/lib/Analysis/ValueTracking.cpp
2014 ↗	(On Diff #539860)	Perhaps this is better done in a separate patch? That way we can see what tests are affected by this change alone, since it is quite significant. Also, in the same patch you will need to update the LangRef to say that using the vscale_range attribute implies vscale is a power of 2. Then, in a follow-on patch you can add the InstCombineMulDivRem.cpp change, which will show the tests that have changed purely due to the urem optimisation.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1945 ↗	(On Diff #539860)	I could be wrong, but I suspect this is a fairly expensive function to call. Perhaps it's worth restructuring the code to only call it when you know Op0 is a power of 2? For example, something like: if (match(Op0, m_Power2(RemC)) { KnownBits Known = computeKnownBits(Op1, 0, &I); ... }
1950 ↗	(On Diff #539860)	I think you can avoid the `and` by simply returning null, i.e. return ConstantInt::getNullValue(Ty);
llvm/test/Transforms/InstCombine/AArch64/rem-mul-shl.ll
39 ↗	(On Diff #539860)	Based on your ValueTracking change I think you also need a negative test when vscale_range is set to something like (1,15)

paulwalker-arm added inline comments.Jul 13 2023, 2:31 AM

llvm/test/Transforms/InstCombine/AArch64/rem-mul-shl.ll
39 ↗	(On Diff #539860)	Based on my suggestion such a test cannot be written because it will be bad IR that will generate an error once the LangRef change is made, which needs to happen before we can have patches that rely on the new behaviour.

paulwalker-arm added inline comments.Jul 13 2023, 2:38 AM

llvm/lib/Analysis/ValueTracking.cpp
2019–2023 ↗	(On Diff #539860)	Based on my suggestion you shouldn't need to check the values of the `vscale_range` attribute. Simply having the attribute is enough to imply the power-of-two-ness.

(Removing from review queue as this needs a LangRef change first.)

This revision now requires changes to proceed.Jul 13 2023, 2:40 AM

Allen mentioned this in D155193: [LangRef] vscale_range implies the vscale is power-of-two.Jul 13 2023, 6:05 AM

Matt added a subscriber: Matt.Jul 13 2023, 2:35 PM

Allen mentioned this in rG7203286329de: [LangRef] vscale_range implies the vscale is power-of-two.Jul 14 2023, 6:18 PM

Allen mentioned this in D155350: [ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo.Jul 14 2023, 7:11 PM

Allen added a parent revision: D155350: [ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo.Jul 15 2023, 1:21 AM

Allen mentioned this in rG4d2723bd001f: [ValueTracking] Support vscale assumes for isKnownToBeAPowerOfTwo.Jul 15 2023, 4:44 AM

Allen added a parent revision: D146199: [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail.Jul 15 2023, 4:51 AM

rebase after the separated commit D155350

Allen marked 4 inline comments as done.Jul 16 2023, 7:31 PM

Allen added inline comments.

llvm/lib/Analysis/ValueTracking.cpp
2014 ↗	(On Diff #539860)	Done, thanks
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1945 ↗	(On Diff #539860)	apply your comment, thanks
1950 ↗	(On Diff #539860)	there is compilation problem if I return ConstantInt::getNullValue(Ty) directly error: cannot convert 'llvm::Constant' to 'llvm::Instruction' in return

Harbormaster completed remote builds in B245715: Diff 540848.Jul 16 2023, 8:22 PM

paulwalker-arm added inline comments.Jul 17 2023, 3:29 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1950 ↗	(On Diff #539860)	I believe this suggests that rather than an InstCombine this transformation should live in InstSimplify (e.g. simplifyURemInst).

refactor with InstSimplify according comment

Allen marked an inline comment as done.Jul 17 2023, 5:05 AM

Allen added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
1950 ↗	(On Diff #539860)	Apply your comment, thanks

nikic added a reviewer: goldstein.w.n.Jul 17 2023, 5:48 AM

Please add an alive2 proof to the patch description.

llvm/lib/Analysis/InstructionSimplify.cpp
1286	This check looks a bit roundabout. I think what you actually want to check is that `Known.getMaxValue().ule(*RemC)`?

Harbormaster completed remote builds in B245794: Diff 540957.Jul 17 2023, 7:38 AM

goldstein.w.n added inline comments.Jul 17 2023, 9:08 AM

llvm/lib/Analysis/InstructionSimplify.cpp
1283	move the match of Op0 against a constant to before the far more expensive `isKnownToBeAPowerOfTwo` check on Op1.
1288	Since we are already here, might as well also add an `else if(Known.getMinValue().ugt(*RemC)) { return Op0; }` Proofs: https://alive2.llvm.org/ce/z/FkTMoy

Allen updated this revision to Diff 541364.Jul 18 2023, 12:44 AM

Allen marked an inline comment as done.

nikic added inline comments.Jul 18 2023, 12:55 AM

llvm/lib/Analysis/InstructionSimplify.cpp
1285	This should be just `RemC->uge(Known.getMaxValue())`. Same below.
1288	It looks like these proofs also work with urem replaced by srem: https://alive2.llvm.org/ce/z/-8RHjq So we should move this into simplifyRem and support both.

Allen marked an inline comment as done.Jul 18 2023, 12:59 AM

Allen added inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
1283	Done, thanks
1286	I still use the getActiveBits because the value of Known.getMaxValue() is not a power-of-two value. For example, we can easily get the the ConstantRange(8, 16) from the vscale_range(1,16), but then the Known = ConstantRange(8, 16)->toKnownBits() will get the value of Known.getMaxValue() is 31 instead of 16, so I can't compare them with its value directly. I also change the boundary test for test case urem_vscale_range and urem_shl_vscale_out_of_range

nikic added inline comments.Jul 18 2023, 1:17 AM

llvm/lib/Analysis/InstructionSimplify.cpp
1286	Okay, I see. Would it work if you used computeConstantRange() instead of computeKnownBits()? That should give an accurate range for vscale.

Harbormaster completed remote builds in B246102: Diff 541364.Jul 18 2023, 4:10 AM

refactor with computeConstantRange according comment

Allen marked 4 inline comments as done.Jul 18 2023, 4:56 AM

Allen added inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
1286	thanks, the new API computeConstantRange works

Harbormaster completed remote builds in B246167: Diff 541459.Jul 18 2023, 9:46 AM

update the Upper bound

Harbormaster completed remote builds in B246520: Diff 541971.Jul 19 2023, 7:29 AM

Can you add the alive2 links to the summary / commit message?

goldstein.w.n added inline comments.Jul 19 2023, 10:32 AM

llvm/lib/Analysis/ValueTracking.cpp
8420 ↗	(On Diff #541971)	Does `vscale` also enforce that `C` is within BitWidth range? Otherwise do we need a check here?

Add checking the range of shift number according comment

I don't add alive2 because vscale is not supported, https://github.com/AliveToolkit/alive2/issues/923

Harbormaster completed remote builds in B246847: Diff 542418.Jul 20 2023, 5:45 AM

In D154953#4518239, @Allen wrote:

I don't add alive2 because vscale is not supported, https://github.com/AliveToolkit/alive2/issues/923

The patch doesn't rely on alive2, just on the power of 2 nature of the arguments.
You can use: https://alive2.llvm.org/ce/z/FkTMoy + adding the srem versions.

nikic added inline comments.Jul 20 2023, 9:22 AM

llvm/lib/Analysis/InstructionSimplify.cpp
1286	Hrm, it looks like computeConstantRange() only works with an additional special case for shl of vscale. That's ... not great, and probably fragile, because the same thing will happen with more complex patterns. I think now that I understand why you did this, I would prefer to go back to your previous patch that used getActiveBits(). Just add a comment that we need to use getActiveBits() to make use of the additional power of two knowledge.

It seems the alive2 doesn't support the semantics of vscale, https://github.com/AliveToolkit/alive2/issues/923

In D154953#4519341, @goldstein.w.n wrote:

In D154953#4518239, @Allen wrote:

I don't add alive2 because vscale is not supported, https://github.com/AliveToolkit/alive2/issues/923

The patch doesn't rely on alive2, just on the power of 2 nature of the arguments.
You can use: https://alive2.llvm.org/ce/z/FkTMoy + adding the srem versions.

The case in the link can't be optimized with this patch because we can't infer the operands of urem is power-of-two with isKnownToBeAPowerOfTwo now, so I'll try it with a separate patch

address comment
a) revert the checking with getActiveBits
b) as the rem --> and done in D155350, so this transformation live in simplifyAndInst instead of simplifyURemInst

Harbormaster completed remote builds in B247118: Diff 542781.Jul 21 2023, 2:16 AM

ping ?

In D154953#4521461, @Allen wrote:

It seems the alive2 doesn't support the semantics of vscale, https://github.com/AliveToolkit/alive2/issues/923

In D154953#4519341, @goldstein.w.n wrote:

In D154953#4518239, @Allen wrote:

I don't add alive2 because vscale is not supported, https://github.com/AliveToolkit/alive2/issues/923

The patch doesn't rely on alive2, just on the power of 2 nature of the arguments.
You can use: https://alive2.llvm.org/ce/z/FkTMoy + adding the srem versions.

The case in the link can't be optimized with this patch because we can't infer the operands of urem is power-of-two with isKnownToBeAPowerOfTwo now, so I'll try it with a separate patch

Yes, but the link show that the transform is semantically equivilent. The case in the link covers anything we detect in the patch (assuming non-buggy codes).

In D154953#4526886, @Allen wrote:

ping ?

Can you

Add some tests that aren't reliant on vscale (just some simple cases is fine).
would still like alive2 link.

Code change looks fine to me.

(Sorry, misclicked approve earlier).

This revision now requires changes to proceed.Jul 24 2023, 4:29 PM

Add 2 new cases with alive2 link proof

Can you

Add some tests that aren't reliant on vscale (just some simple cases is fine).

would still like alive2 link.

Code change looks fine to me.

hi @goldstein.w.n, I added 2 new cases and_add_shl and and_add_shl_overlap with alive2 link according your comment.
Because the canonicalizeLowbitMask always fold 1 << x into ~(-1 << x), so I also add extra code to match this.
I also try to debug something like https://alive2.llvm.org/ce/z/r5XLZj, and find the assume works on op0_p2 instead of pow2 itself, so there is some different to handle that case.

Harbormaster completed remote builds in B247889: Diff 543839.Jul 25 2023, 6:33 AM

Allen edited the summary of this revision. (Show Details)Jul 26 2023, 6:02 PM

goldstein.w.n added inline comments.Jul 28 2023, 10:14 AM

llvm/lib/Analysis/InstructionSimplify.cpp
2135	`m_Not` instead of `m_Xor(v, -1)`. Also the comment doesn't quite match the codes.

Allen updated this revision to Diff 545320.Jul 28 2023, 6:25 PM

Allen marked an inline comment as done.Jul 28 2023, 6:28 PM

Allen added inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
2135	Apply your comment, thanks. (Also adjust the comment)

Harbormaster completed remote builds in B248960: Diff 545320.Jul 28 2023, 7:14 PM

goldstein.w.n added inline comments.Jul 28 2023, 9:56 PM

llvm/test/Transforms/InstCombine/and-add-shl.ll
40 ↗	(On Diff #545320)	You only have test for the first case (need 1 for `not` case). Also can you precommit the tests?

precommit test on D156591

Allen added a parent revision: D156591: [tests] precommit tests for D154953.Jul 29 2023, 2:03 AM

Harbormaster completed remote builds in B248979: Diff 545348.Jul 29 2023, 2:03 AM

Allen marked an inline comment as done.Jul 29 2023, 2:07 AM

Allen added inline comments.

llvm/test/Transforms/InstCombine/and-add-shl.ll
40 ↗	(On Diff #545320)	add a new case and_not_shl for `not` case, and also precommit tests on D156591, thanks.

Fixes test Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Harbormaster completed remote builds in B249112: Diff 545516.Jul 31 2023, 12:59 AM

LGTM.

rebase to top

This revision was not accepted when it landed; it landed in state Needs Review.Jul 31 2023, 8:23 PM

This revision was landed with ongoing or failed builds.

Closed by commit rG3e386b227886: [InstSimplify] Remove the remainder loop if we know the mask is always true (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen added a commit: rG3e386b227886: [InstSimplify] Remove the remainder loop if we know the mask is always true.

Allen mentioned this in rG44d14a13a95f: [tests] precommit tests for D154953.

Harbormaster completed remote builds in B249382: Diff 545903.Jul 31 2023, 9:02 PM

nikic added a reverting change: rGeb9fce092a7d: Revert "[InstSimplify] Remove the remainder loop if we know the mask is always….Aug 1 2023, 12:03 AM

proofs： https://alive2.llvm.org/ce/z/FkTMoy

These proofs are for a different transform (urem x) than what was implemented (and x - 1).

llvm/lib/Analysis/InstructionSimplify.cpp
2143	I don't think this second fold should be added. This is something that can be handled via simple range propagation. In fact, IPSCCP does handle this already. We could make CVP handle it as well, if we wanted.

In D154953#4549538, @nikic wrote:

proofs： https://alive2.llvm.org/ce/z/FkTMoy

These proofs are for a different transform (urem x) than what was implemented (and x - 1).

sorry :(

Figured was okay as urem by pow2 == and by mask.

Allen edited the summary of this revision. (Show Details)Aug 1 2023, 1:04 AM

Remove the 2nd fold, and update the alive2 link to use and directly, https://alive2.llvm.org/ce/z/bT62Wa

Harbormaster completed remote builds in B249443: Diff 545994.Aug 1 2023, 3:33 AM

Allen reopened this revision.Aug 1 2023, 3:35 AM

Allen marked an inline comment as done.

Allen added inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
2143	Thanks, I'll try this with CVP , and now adopt the 2nd fold

nikic added inline comments.Aug 1 2023, 4:02 AM

llvm/lib/Analysis/InstructionSimplify.cpp
83	Unnecessary change.
llvm/test/Transforms/InstCombine/rem-mul-shl.ll
887	Please also add tests that are directly in the add -1 form (as that's what is actually being folded). We should also test the case where the constant operand is not a power of two, as it is a pre-condition of your transform. (Actually, we don't really need a power of two, it would be sufficient if it does not have any bits that may be part of the mask set. But it's a requirement for your current implementation.)

Address comments

Harbormaster completed remote builds in B249461: Diff 546018.Aug 1 2023, 6:14 AM

Allen updated this revision to Diff 546019.Aug 1 2023, 6:18 AM

Allen marked 2 inline comments as done.

Harbormaster completed remote builds in B249462: Diff 546019.Aug 1 2023, 6:19 AM

LGTM

llvm/test/Transforms/InstCombine/rem-mul-shl.ll
901	This one does have low bits set (it would be a negative test that cannot be folded). An example that can be folded is constant 3072 (3 * 1024).

This revision is now accepted and ready to land.Aug 1 2023, 6:21 AM

nikic added inline comments.Aug 1 2023, 6:21 AM

llvm/lib/Analysis/InstructionSimplify.cpp
2121	Shift -> X to match the comment. This value doesn't need to be a shift.

nikic added inline comments.Aug 1 2023, 6:23 AM

llvm/lib/Analysis/InstructionSimplify.cpp
2121	Ignore this comment, X is not the same value. (Shift it not a great name, but I'm not sure what to call it right now.)

thanks, change the const value to 3072 according comment for test and_add_shl_vscale_not_power2

Harbormaster completed remote builds in B249465: Diff 546022.Aug 1 2023, 6:27 AM

rebase to top

This revision was landed with ongoing or failed builds.Aug 1 2023, 7:23 AM

Closed by commit rG497966f7f2bb: Reland [InstSimplify] Remove the remainder loop if we know the mask is always… (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen added a commit: rG497966f7f2bb: Reland [InstSimplify] Remove the remainder loop if we know the mask is always….

Harbormaster completed remote builds in B249479: Diff 546042.Aug 1 2023, 8:02 AM

Allen mentioned this in D156845: [ConstantRange] Calculate precise range for shl by -1.Aug 1 2023, 7:36 PM

This commit caused misoptimizations in the WMA decoder in ffmpeg, observed on all architectures. The misoptimization can be observed with https://martin.st/temp/wma-preproc.c, compiled with clang -target aarch64-linux-gnu -c -O3 wma-preproc.c -o libavcodec/wma.o.

For a full runtime reproducible case:

$ git clone git://source.ffmpeg.org/ffmpeg
$ mkdir ffmpeg-build
$ cd ffmpeg-build
$ ../ffmpeg/configure --cc=clang --samples=../fate-samples
$ make -j$(nproc)
$ make fate-rsync
$ make fate-wmapro-2ch

If it takes a long time to fix, I'd appreciate reverting it in the meantime.

Sorry for the trouble.
Would you show how do I check whether the current function is normal based on the running result?
In addition, I see a library file named libswscale/libswscale.a. I don't know if this is simulating the sve feature. If so, the current optimization based on vscale is a power-of-two value. I don't know whether this assumption will affect the results.

In D154953#4553482, @Allen wrote:

Sorry for the trouble.
Would you show how do I check whether the current function is normal based on the running result?

With the steps outlined above, cloning ffmpeg and compiling it, if you run make -j$(nproc) fate-wmapro-2ch, it should print one TEST line and exit with a 0 exit code if the code was correctly compiled, or print an error if it was misoptimized.

In addition, I see a library file named libswscale/libswscale.a. I don't know if this is simulating the sve feature. If so, the current optimization based on vscale is a power-of-two value. I don't know whether this assumption will affect the results.

That is an entirely unrelated library for scaling and color conversion of video frames, it has nothing to do with the ARM SVE feature.

To document this issue, I submitted an record for this, https://github.com/llvm/llvm-project/issues/64339, I'll continue to follow up on this issue there.

Allen added a child revision: D156881: [InstSimplify] Check the NonZero for power of two value.Aug 2 2023, 5:39 AM

foad mentioned this in D156881: [InstSimplify] Check the NonZero for power of two value.Aug 2 2023, 8:49 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

InstructionSimplify.cpp

13 lines

test/

Transforms/

InstCombine/

rem-mul-shl.ll

39 lines

LoopVectorize/

AArch64/

sve-interleaved-accesses.ll

586 lines

sve-widen-phi.ll

51 lines

Diff 546049

llvm/lib/Analysis/InstructionSimplify.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
static Value simplifySelectInst(Value , Value , Value ,		static Value simplifySelectInst(Value , Value , Value ,
const SimplifyQuery &, unsigned);		const SimplifyQuery &, unsigned);
static Value simplifyInstructionWithOperands(Instruction I,		static Value simplifyInstructionWithOperands(Instruction I,
ArrayRef<Value *> NewOps,		ArrayRef<Value *> NewOps,
const SimplifyQuery &SQ,		const SimplifyQuery &SQ,
unsigned MaxRecurse);		unsigned MaxRecurse);

static Value foldSelectWithBinaryOp(Value Cond, Value *TrueVal,		static Value foldSelectWithBinaryOp(Value Cond, Value *TrueVal,
Value *FalseVal) {		Value *FalseVal) {
		nikicUnsubmitted Done Reply Inline Actions Unnecessary change. nikic: Unnecessary change.
BinaryOperator::BinaryOps BinOpCode;		BinaryOperator::BinaryOps BinOpCode;
if (auto *BO = dyn_cast<BinaryOperator>(Cond))		if (auto *BO = dyn_cast<BinaryOperator>(Cond))
BinOpCode = BO->getOpcode();		BinOpCode = BO->getOpcode();
else		else
return nullptr;		return nullptr;

CmpInst::Predicate ExpectedPred, Pred1, Pred2;		CmpInst::Predicate ExpectedPred, Pred1, Pred2;
if (BinOpCode == BinaryOperator::Or) {		if (BinOpCode == BinaryOperator::Or) {
▲ Show 20 Lines • Show All 1,183 Lines • ▼ Show 20 Lines
/// If not, this returns null.		/// If not, this returns null.
static Value simplifyURemInst(Value Op0, Value *Op1, const SimplifyQuery &Q,		static Value simplifyURemInst(Value Op0, Value *Op1, const SimplifyQuery &Q,
unsigned MaxRecurse) {		unsigned MaxRecurse) {
return simplifyRem(Instruction::URem, Op0, Op1, Q, MaxRecurse);		return simplifyRem(Instruction::URem, Op0, Op1, Q, MaxRecurse);
}		}

Value llvm::simplifyURemInst(Value Op0, Value *Op1, const SimplifyQuery &Q) {		Value llvm::simplifyURemInst(Value Op0, Value *Op1, const SimplifyQuery &Q) {
return ::simplifyURemInst(Op0, Op1, Q, RecursionLimit);		return ::simplifyURemInst(Op0, Op1, Q, RecursionLimit);
}		}
		goldstein.w.nUnsubmitted Done Reply Inline Actions move the match of Op0 against a constant to before the far more expensive `isKnownToBeAPowerOfTwo` check on Op1. goldstein.w.n: move the match of Op0 against a constant to before the far more expensive…
		AllenAuthorUnsubmitted Done Reply Inline Actions Done, thanks Allen: Done, thanks

/// Returns true if a shift by \c Amount always yields poison.		/// Returns true if a shift by \c Amount always yields poison.
		nikicUnsubmitted Done Reply Inline Actions This should be just `RemC->uge(Known.getMaxValue())`. Same below. nikic: This should be just `RemC->uge(Known.getMaxValue())`. Same below.
static bool isPoisonShift(Value *Amount, const SimplifyQuery &Q) {		static bool isPoisonShift(Value *Amount, const SimplifyQuery &Q) {
		nikicUnsubmitted Done Reply Inline Actions This check looks a bit roundabout. I think what you actually want to check is that `Known.getMaxValue().ule(RemC)`? nikic:* This check looks a bit roundabout. I think what you actually want to check is that `Known.
		AllenAuthorUnsubmitted Done Reply Inline Actions I still use the getActiveBits because the value of Known.getMaxValue() is not a power-of-two value. For example, we can easily get the the ConstantRange(8, 16) from the vscale_range(1,16), but then the Known = ConstantRange(8, 16)->toKnownBits() will get the value of Known.getMaxValue() is 31 instead of 16, so I can't compare them with its value directly. I also change the boundary test for test case urem_vscale_range and urem_shl_vscale_out_of_range Allen: I still use the getActiveBits because the value of Known.getMaxValue() is not a power…
		nikicUnsubmitted Done Reply Inline Actions Okay, I see. Would it work if you used computeConstantRange() instead of computeKnownBits()? That should give an accurate range for vscale. nikic: Okay, I see. Would it work if you used computeConstantRange() instead of computeKnownBits()?
		AllenAuthorUnsubmitted Done Reply Inline Actions thanks, the new API computeConstantRange works Allen: thanks, the new API computeConstantRange works
		nikicUnsubmitted Done Reply Inline Actions Hrm, it looks like computeConstantRange() only works with an additional special case for shl of vscale. That's ... not great, and probably fragile, because the same thing will happen with more complex patterns. I think now that I understand why you did this, I would prefer to go back to your previous patch that used getActiveBits(). Just add a comment that we need to use getActiveBits() to make use of the additional power of two knowledge. nikic: Hrm, it looks like computeConstantRange() only works with an additional special case for shl of…
Constant *C = dyn_cast<Constant>(Amount);		Constant *C = dyn_cast<Constant>(Amount);
if (!C)		if (!C)
		goldstein.w.nUnsubmitted Done Reply Inline Actions Since we are already here, might as well also add an `else if(Known.getMinValue().ugt(RemC)) { return Op0; }` Proofs: https://alive2.llvm.org/ce/z/FkTMoy goldstein.w.n:* Since we are already here, might as well also add an `else if(Known.getMinValue().ugt(*RemC)) {…
		nikicUnsubmitted Done Reply Inline Actions It looks like these proofs also work with urem replaced by srem: https://alive2.llvm.org/ce/z/-8RHjq So we should move this into simplifyRem and support both. nikic: It looks like these proofs also work with urem replaced by srem: https://alive2.llvm.org/ce/z/…
return false;		return false;

// X shift by undef -> poison because it may shift by the bitwidth.		// X shift by undef -> poison because it may shift by the bitwidth.
if (Q.isUndefValue(C))		if (Q.isUndefValue(C))
return true;		return true;

// Shifting by the bitwidth or more is poison. This covers scalars and		// Shifting by the bitwidth or more is poison. This covers scalars and
// fixed/scalable vectors with splat constants.		// fixed/scalable vectors with splat constants.
▲ Show 20 Lines • Show All 814 Lines • ▼ Show 20 Lines	if (match(Op1, m_APInt(Mask))) {

// If all bits in the inverted and shifted mask are clear:		// If all bits in the inverted and shifted mask are clear:
// and (lshr X, ShAmt), Mask --> lshr X, ShAmt		// and (lshr X, ShAmt), Mask --> lshr X, ShAmt
if (match(Op0, m_LShr(m_Value(X), m_APInt(ShAmt))) &&		if (match(Op0, m_LShr(m_Value(X), m_APInt(ShAmt))) &&
(~(Mask)).shl(ShAmt).isZero())		(~(Mask)).shl(ShAmt).isZero())
return Op0;		return Op0;
}		}

		// and 2^x-1, 2^C --> 0 where x <= C.
		const APInt *PowerC;
		Value *Shift;
		nikicUnsubmitted Not Done Reply Inline Actions Shift -> X to match the comment. This value doesn't need to be a shift. nikic: Shift -> X to match the comment. This value doesn't need to be a shift.
		nikicUnsubmitted Not Done Reply Inline Actions Ignore this comment, X is not the same value. (Shift it not a great name, but I'm not sure what to call it right now.) nikic: Ignore this comment, X is not the same value. (Shift it not a great name, but I'm not sure what…
		if (match(Op1, m_Power2(PowerC)) &&
		match(Op0, m_Add(m_Value(Shift), m_AllOnes())) &&
		isKnownToBeAPowerOfTwo(Shift, Q.DL, /OrZero/ true, 0, Q.AC, Q.CxtI,
		Q.DT)) {
		KnownBits Known = computeKnownBits(Shift, Q.DL, 0, Q.AC, Q.CxtI, Q.DT);
		// Use getActiveBits() to make use of the additional power of two knowledge
		if (PowerC->getActiveBits() >= Known.getMaxValue().getActiveBits())
		return ConstantInt::getNullValue(Op1->getType());
		}

// If we have a multiplication overflow check that is being 'and'ed with a		// If we have a multiplication overflow check that is being 'and'ed with a
// check that one of the multipliers is not zero, we can omit the 'and', and		// check that one of the multipliers is not zero, we can omit the 'and', and
// only keep the overflow check.		// only keep the overflow check.
if (isCheckForZeroAndMulWithOverflow(Op0, Op1, true))		if (isCheckForZeroAndMulWithOverflow(Op0, Op1, true))
		goldstein.w.nUnsubmitted Done Reply Inline Actions `m_Not` instead of `m_Xor(v, -1)`. Also the comment doesn't quite match the codes. goldstein.w.n: `m_Not` instead of `m_Xor(v, -1)`. Also the comment doesn't quite match the codes.
		AllenAuthorUnsubmitted Done Reply Inline Actions Apply your comment, thanks. (Also adjust the comment) Allen: Apply your comment, thanks. (Also adjust the comment)
return Op1;		return Op1;
if (isCheckForZeroAndMulWithOverflow(Op1, Op0, true))		if (isCheckForZeroAndMulWithOverflow(Op1, Op0, true))
return Op0;		return Op0;

// A & (-A) = A if A is a power of two or zero.		// A & (-A) = A if A is a power of two or zero.
if (match(Op0, m_Neg(m_Specific(Op1))) \|\|		if (match(Op0, m_Neg(m_Specific(Op1))) \|\|
match(Op1, m_Neg(m_Specific(Op0)))) {		match(Op1, m_Neg(m_Specific(Op0)))) {
if (isKnownToBeAPowerOfTwo(Op0, Q.DL, /OrZero/ true, 0, Q.AC, Q.CxtI,		if (isKnownToBeAPowerOfTwo(Op0, Q.DL, /OrZero/ true, 0, Q.AC, Q.CxtI,
		nikicUnsubmitted Done Reply Inline Actions I don't think this second fold should be added. This is something that can be handled via simple range propagation. In fact, IPSCCP does handle this already. We could make CVP handle it as well, if we wanted. nikic: I don't think this second fold should be added. This is something that can be handled via…
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks, I'll try this with CVP , and now adopt the 2nd fold Allen: Thanks, I'll try this with CVP , and now adopt the 2nd fold
Q.DT))		Q.DT))
return Op0;		return Op0;
if (isKnownToBeAPowerOfTwo(Op1, Q.DL, /OrZero/ true, 0, Q.AC, Q.CxtI,		if (isKnownToBeAPowerOfTwo(Op1, Q.DL, /OrZero/ true, 0, Q.AC, Q.CxtI,
Q.DT))		Q.DT))
return Op1;		return Op1;
}		}

// This is a similar pattern used for checking if a value is a power-of-2:		// This is a similar pattern used for checking if a value is a power-of-2:
▲ Show 20 Lines • Show All 4,921 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/rem-mul-shl.ll

Show First 20 Lines • Show All 837 Lines • ▼ Show 20 Lines	;
%vscale = call i64 @llvm.vscale.i64()		%vscale = call i64 @llvm.vscale.i64()
%shift = shl nuw nsw i64 %vscale, 2		%shift = shl nuw nsw i64 %vscale, 2
%rem = urem i64 1024, %shift		%rem = urem i64 1024, %shift
ret i64 %rem		ret i64 %rem
}		}

define i64 @urem_shl_vscale_range() vscale_range(1,16) {		define i64 @urem_shl_vscale_range() vscale_range(1,16) {
; CHECK-LABEL: @urem_shl_vscale_range(		; CHECK-LABEL: @urem_shl_vscale_range(
; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: ret i64 0
; CHECK-NEXT: [[SHIFT:%.*]] = shl nuw nsw i64 [[VSCALE]], 2
; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[SHIFT]], 2047
; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], 1024
; CHECK-NEXT: ret i64 [[REM]]
;		;
%vscale = call i64 @llvm.vscale.i64()		%vscale = call i64 @llvm.vscale.i64()
%shift = shl nuw nsw i64 %vscale, 2		%shift = shl nuw nsw i64 %vscale, 2
%rem = urem i64 1024, %shift		%rem = urem i64 1024, %shift
ret i64 %rem		ret i64 %rem
}		}

define i64 @urem_vscale_range() vscale_range(1,16) {		define i64 @urem_vscale_range() vscale_range(1,16) {
; CHECK-LABEL: @urem_vscale_range(		; CHECK-LABEL: @urem_vscale_range(
; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: ret i64 0
; CHECK-NEXT: [[SHIFT:%.*]] = shl nuw nsw i64 [[VSCALE]], 6
; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[SHIFT]], 2047
; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], 1024
; CHECK-NEXT: ret i64 [[REM]]
;		;
%vscale = call i64 @llvm.vscale.i64()		%vscale = call i64 @llvm.vscale.i64()
%shift = shl nuw nsw i64 %vscale, 6		%shift = shl nuw nsw i64 %vscale, 6
%rem = urem i64 1024, %shift		%rem = urem i64 1024, %shift
ret i64 %rem		ret i64 %rem
}		}

define i64 @urem_shl_vscale_out_of_range() vscale_range(1,16) {		define i64 @urem_shl_vscale_out_of_range() vscale_range(1,16) {
Show All 14 Lines
; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[SHIFT]], 2047		; CHECK-NEXT: [[TMP1:%.*]] = add nuw nsw i64 [[SHIFT]], 2047
; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], 1024		; CHECK-NEXT: [[REM:%.*]] = and i64 [[TMP1]], 1024
; CHECK-NEXT: ret i64 [[REM]]		; CHECK-NEXT: ret i64 [[REM]]
;		;
%vscale = call i64 @llvm.vscale.i64()		%vscale = call i64 @llvm.vscale.i64()
%shift = shl nuw nsw i64 %vscale, 10		%shift = shl nuw nsw i64 %vscale, 10
%rem = urem i64 1024, %shift		%rem = urem i64 1024, %shift
ret i64 %rem		ret i64 %rem
}		}
		nikicUnsubmitted Done Reply Inline Actions Please also add tests that are directly in the add -1 form (as that's what is actually being folded). We should also test the case where the constant operand is not a power of two, as it is a pre-condition of your transform. (Actually, we don't really need a power of two, it would be sufficient if it does not have any bits that may be part of the mask set. But it's a requirement for your current implementation.) nikic: Please also add tests that are directly in the add -1 form (as that's what is actually being…

		define i64 @and_add_vscale_range_low() vscale_range(1,16) {
		; CHECK-LABEL: @and_add_vscale_range_low(
		; CHECK-NEXT: ret i64 0
		;
		%vscale = call i64 @llvm.vscale.i64()
		%shift = shl nuw nsw i64 %vscale, 6
		%add = add i64 %shift, -1
		%rem = and i64 1024, %add
		ret i64 %rem
		}

		; TODO: have no bits that may be part of the mask set,
		; but now expect the const is a power of two
		nikicUnsubmitted Not Done Reply Inline Actions This one does have low bits set (it would be a negative test that cannot be folded). An example that can be folded is constant 3072 (3 * 1024). nikic: This one does have low bits set (it would be a negative test that cannot be folded). An…
		define i64 @and_add_shl_vscale_not_power2() vscale_range(1,16) {
		; CHECK-LABEL: @and_add_shl_vscale_not_power2(
		; CHECK-NEXT: [[VSCALE:%.*]] = call i64 @llvm.vscale.i64()
		; CHECK-NEXT: [[SHIFT:%.*]] = shl nuw nsw i64 [[VSCALE]], 6
		; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i64 [[SHIFT]], 4095
		; CHECK-NEXT: [[REM:%.*]] = and i64 [[ADD]], 3072
		; CHECK-NEXT: ret i64 [[REM]]
		;
		%vscale = call i64 @llvm.vscale.i64()
		%shift = shl nuw nsw i64 %vscale, 6
		%add = add i64 %shift, -1
		%rem = and i64 3072, %add
		ret i64 %rem
		}

llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll

	Show All 20 Lines
	@AB = common global [1024 x i32] zeroinitializer, align 4			@AB = common global [1024 x i32] zeroinitializer, align 4
	@CD = common global [1024 x i32] zeroinitializer, align 4			@CD = common global [1024 x i32] zeroinitializer, align 4

	define void @test_array_load2_store2(i32 %C, i32 %D) #1 {			define void @test_array_load2_store2(i32 %C, i32 %D) #1 {
	; CHECK-LABEL: @test_array_load2_store2(			; CHECK-LABEL: @test_array_load2_store2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 1020
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 512
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP0]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP1:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1			; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <vscale x 4 x i32> [[TMP2]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <vscale x 4 x i32> [[TMP1]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP6:%.*]] = mul nsw <vscale x 4 x i32> [[TMP3]], [[BROADCAST_SPLAT2]]			; CHECK-NEXT: [[TMP5:%.*]] = mul nsw <vscale x 4 x i32> [[TMP2]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP3]]
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP7]], i64 -1			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i64 -1
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP5]], <vscale x 4 x i32> [[TMP6]])			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP4]], <vscale x 4 x i32> [[TMP5]])
	; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP8]], align 4			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP10:%.*]] = shl nuw nsw i64 [[TMP9]], 2			; CHECK-NEXT: [[TMP9:%.*]] = shl nuw nsw i64 [[TMP8]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP10]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
	; CHECK-NEXT: br i1 [[TMP11]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK-NEXT: [[ARRAYIDX0:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[LOAD1:%.*]] = load i32, ptr [[ARRAYIDX0]], align 4
	; CHECK-NEXT: [[OR:%.*]] = or i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OR]]
	; CHECK-NEXT: [[LOAD2:%.*]] = load i32, ptr [[ARRAYIDX1]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[LOAD1]], [[C]]
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[LOAD2]], [[D]]
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[ADD]], ptr [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[OR]]
	; CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX3]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	Show All 33 Lines

	@AB_i16 = common global [1024 x i16] zeroinitializer, align 4			@AB_i16 = common global [1024 x i16] zeroinitializer, align 4

	define void @test_array_load2_i16_store2(i32 %C, i32 %D) #1 {			define void @test_array_load2_i16_store2(i32 %C, i32 %D) #1 {
	; CHECK-LABEL: @test_array_load2_i16_store2(			; CHECK-LABEL: @test_array_load2_i16_store2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 1020			; CHECK-NEXT: [[TMP1:%.*]] = shl <vscale x 4 x i64> [[TMP0]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 512			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 3
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP3]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = shl <vscale x 4 x i64> [[TMP1]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP3]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP4]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT2:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT3:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT2]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP2]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP1]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP5]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP4]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)
	; CHECK-NEXT: [[TMP6:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP5:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[TMP6]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, <vscale x 4 x i64> [[TMP5]]
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP7]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i16> @llvm.masked.gather.nxv4i16.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i16> poison)
	; CHECK-NEXT: [[TMP8:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER]] to <vscale x 4 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <vscale x 4 x i32> [[BROADCAST_SPLAT]], [[TMP8]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <vscale x 4 x i32> [[BROADCAST_SPLAT]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER1]] to <vscale x 4 x i32>			; CHECK-NEXT: [[TMP9:%.*]] = sext <vscale x 4 x i16> [[WIDE_MASKED_GATHER1]] to <vscale x 4 x i32>
	; CHECK-NEXT: [[TMP11:%.*]] = mul nsw <vscale x 4 x i32> [[BROADCAST_SPLAT3]], [[TMP10]]			; CHECK-NEXT: [[TMP10:%.*]] = mul nsw <vscale x 4 x i32> [[BROADCAST_SPLAT3]], [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <vscale x 4 x i64> [[TMP6]], i64 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <vscale x 4 x i64> [[TMP5]], i64 0
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP12]]			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP11]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i64 -1			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i64 -1
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP9]], <vscale x 4 x i32> [[TMP11]])			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP8]], <vscale x 4 x i32> [[TMP10]])
	; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP14]], align 4			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP13]], align 4
	; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2			; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP18:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
	; CHECK-NEXT: [[TMP19:%.*]] = or i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i16], ptr @AB_i16, i64 0, i64 [[TMP19]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i16, ptr [[ARRAYIDX2]], align 2
	; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[TMP18]] to i32
	; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[CONV]], [[C]]
	; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[ADD3]], ptr [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[CONV6:%.*]] = sext i16 [[TMP20]] to i32
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[CONV6]], [[D]]
	; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds [1024 x i32], ptr @CD, i64 0, i64 [[TMP19]]
	; CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX9]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	Show All 35 Lines

	@CD_i16 = dso_local local_unnamed_addr global [1024 x i16] zeroinitializer, align 2			@CD_i16 = dso_local local_unnamed_addr global [1024 x i16] zeroinitializer, align 2

	define void @test_array_load2_store2_i16(i32 noundef %C, i32 noundef %D) #1 {			define void @test_array_load2_store2_i16(i32 noundef %C, i32 noundef %D) #1 {
	; CHECK-LABEL: @test_array_load2_store2_i16(			; CHECK-LABEL: @test_array_load2_store2_i16(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 1020			; CHECK-NEXT: [[TMP1:%.*]] = shl <vscale x 4 x i64> [[TMP0]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 512			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 3
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP3]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = shl <vscale x 4 x i64> [[TMP1]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i64 [[TMP3]], 3
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP4]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[C:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[D:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP2]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP1]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP5]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP4]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP6:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1			; CHECK-NEXT: [[TMP6:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP7:%.*]] = or <vscale x 4 x i64> [[VEC_IND]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw <vscale x 4 x i32> [[TMP6]], [[BROADCAST_SPLAT]]			; CHECK-NEXT: [[TMP8:%.*]] = add nsw <vscale x 4 x i32> [[TMP5]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP10:%.*]] = trunc <vscale x 4 x i32> [[TMP9]] to <vscale x 4 x i16>			; CHECK-NEXT: [[TMP9:%.*]] = trunc <vscale x 4 x i32> [[TMP8]] to <vscale x 4 x i16>
	; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[VEC_IND]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP10]], <vscale x 4 x ptr> [[TMP11]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP9]], <vscale x 4 x ptr> [[TMP10]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[TMP12:%.*]] = mul nsw <vscale x 4 x i32> [[TMP7]], [[BROADCAST_SPLAT2]]			; CHECK-NEXT: [[TMP11:%.*]] = mul nsw <vscale x 4 x i32> [[TMP6]], [[BROADCAST_SPLAT2]]
	; CHECK-NEXT: [[TMP13:%.*]] = trunc <vscale x 4 x i32> [[TMP12]] to <vscale x 4 x i16>			; CHECK-NEXT: [[TMP12:%.*]] = trunc <vscale x 4 x i32> [[TMP11]] to <vscale x 4 x i16>
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[TMP8]]			; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, <vscale x 4 x i64> [[TMP7]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP13]], <vscale x 4 x ptr> [[TMP14]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i16.nxv4p0(<vscale x 4 x i16> [[TMP12]], <vscale x 4 x ptr> [[TMP13]], i32 2, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP14:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2			; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i64 [[TMP14]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP15]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP16:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP18:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = or i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds [1024 x i32], ptr @AB, i64 0, i64 [[TMP19]]
	; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP18]], [[C]]
	; CHECK-NEXT: [[CONV:%.*]] = trunc i32 [[ADD3]] to i16
	; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i16 [[CONV]], ptr [[ARRAYIDX5]], align 2
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP20]], [[D]]
	; CHECK-NEXT: [[CONV6:%.*]] = trunc i32 [[MUL]] to i16
	; CHECK-NEXT: [[ARRAYIDX9:%.*]] = getelementptr inbounds [1024 x i16], ptr @CD_i16, i64 0, i64 [[TMP19]]
	; CHECK-NEXT: store i16 [[CONV6]], ptr [[ARRAYIDX9]], align 2
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	Show All 23 Lines

	%struct.ST6 = type { i32, i32, i32, i32, i32, i32 }			%struct.ST6 = type { i32, i32, i32, i32, i32, i32 }

	define i32 @test_struct_load6(%struct.ST6* %S) #1 {			define i32 @test_struct_load6(%struct.ST6* %S) #1 {
	; CHECK-LABEL: @test_struct_load6(			; CHECK-LABEL: @test_struct_load6(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 2044			; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024			; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i64 [[TMP1]], 2
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP2]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 2
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP3]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP1]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[TMP0]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP15:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP14:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_ST6:%.]], ptr [[S:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_ST6:%.]], ptr [[S:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP4]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP3]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 1
	; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP5]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER1:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP4]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 2			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 2
	; CHECK-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER2:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP5]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 3			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 3
	; CHECK-NEXT: [[WIDE_MASKED_GATHER3:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP7]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER3:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP6]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 4			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 4
	; CHECK-NEXT: [[WIDE_MASKED_GATHER4:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER4:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP7]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 5			; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], <vscale x 4 x i64> [[VEC_IND]], i32 5
	; CHECK-NEXT: [[WIDE_MASKED_GATHER5:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP9]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER5:%.*]] = call <vscale x 4 x i32> @llvm.masked.gather.nxv4i32.nxv4p0(<vscale x 4 x ptr> [[TMP8]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i32> poison)
	; CHECK-NEXT: [[TMP10:%.*]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], [[VEC_PHI]]			; CHECK-NEXT: [[TMP9:%.*]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER]], [[VEC_PHI]]
	; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 4 x i32> [[TMP10]], [[WIDE_MASKED_GATHER2]]			; CHECK-NEXT: [[TMP10:%.*]] = add <vscale x 4 x i32> [[TMP9]], [[WIDE_MASKED_GATHER2]]
	; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[WIDE_MASKED_GATHER3]]			; CHECK-NEXT: [[TMP11:%.*]] = add <vscale x 4 x i32> [[WIDE_MASKED_GATHER1]], [[WIDE_MASKED_GATHER3]]
	; CHECK-NEXT: [[TMP13:%.*]] = add <vscale x 4 x i32> [[TMP12]], [[WIDE_MASKED_GATHER4]]			; CHECK-NEXT: [[TMP12:%.*]] = add <vscale x 4 x i32> [[TMP11]], [[WIDE_MASKED_GATHER4]]
	; CHECK-NEXT: [[TMP14:%.*]] = add <vscale x 4 x i32> [[TMP13]], [[WIDE_MASKED_GATHER5]]			; CHECK-NEXT: [[TMP13:%.*]] = add <vscale x 4 x i32> [[TMP12]], [[WIDE_MASKED_GATHER5]]
	; CHECK-NEXT: [[TMP15]] = sub <vscale x 4 x i32> [[TMP11]], [[TMP14]]			; CHECK-NEXT: [[TMP14]] = sub <vscale x 4 x i32> [[TMP10]], [[TMP13]]
	; CHECK-NEXT: [[TMP16:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP15:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP17:%.*]] = shl nuw nsw i64 [[TMP16]], 2			; CHECK-NEXT: [[TMP16:%.*]] = shl nuw nsw i64 [[TMP15]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP17]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP16]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP18]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP19:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP15]])			; CHECK-NEXT: [[TMP18:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP14]])
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_COND_CLEANUP:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ [[TMP19]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; CHECK-NEXT: [[R_041:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[SUB14:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[X:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], i64 [[INDVARS_IV]], i32 0
	; CHECK-NEXT: [[TMP20:%.*]] = load i32, ptr [[X]], align 4
	; CHECK-NEXT: [[Y:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], i64 [[INDVARS_IV]], i32 1
	; CHECK-NEXT: [[TMP21:%.*]] = load i32, ptr [[Y]], align 4
	; CHECK-NEXT: [[Z:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], i64 [[INDVARS_IV]], i32 2
	; CHECK-NEXT: [[TMP22:%.*]] = load i32, ptr [[Z]], align 4
	; CHECK-NEXT: [[W:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], i64 [[INDVARS_IV]], i32 3
	; CHECK-NEXT: [[TMP23:%.*]] = load i32, ptr [[W]], align 4
	; CHECK-NEXT: [[A:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], i64 [[INDVARS_IV]], i32 4
	; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr [[A]], align 4
	; CHECK-NEXT: [[B:%.*]] = getelementptr inbounds [[STRUCT_ST6]], ptr [[S]], i64 [[INDVARS_IV]], i32 5
	; CHECK-NEXT: [[TMP25:%.*]] = load i32, ptr [[B]], align 4
	; CHECK-NEXT: [[DOTNEG36:%.*]] = add i32 [[TMP20]], [[R_041]]
	; CHECK-NEXT: [[TMP26:%.*]] = add i32 [[DOTNEG36]], [[TMP22]]
	; CHECK-NEXT: [[TMP27:%.*]] = add i32 [[TMP21]], [[TMP23]]
	; CHECK-NEXT: [[TMP28:%.*]] = add i32 [[TMP27]], [[TMP24]]
	; CHECK-NEXT: [[TMP29:%.*]] = add i32 [[TMP28]], [[TMP25]]
	; CHECK-NEXT: [[SUB14]] = sub i32 [[TMP26]], [[TMP29]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[SUB14_LCSSA:%.*]] = phi i32 [ [[SUB14]], [[FOR_BODY]] ], [ [[TMP19]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[SUB14_LCSSA:%.*]] = phi i32 [ poison, [[FOR_BODY]] ], [ [[TMP18]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: ret i32 [[SUB14_LCSSA]]			; CHECK-NEXT: ret i32 [[SUB14_LCSSA]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%r.041 = phi i32 [ 0, %entry ], [ %sub14, %for.body ]			%r.041 = phi i32 [ 0, %entry ], [ %sub14, %for.body ]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

	%struct.ST2 = type { i32, i32 }			%struct.ST2 = type { i32, i32 }

	define void @test_reversed_load2_store2(%struct.ST2* noalias nocapture readonly %A, %struct.ST2* noalias nocapture %B) #1 {			define void @test_reversed_load2_store2(%struct.ST2* noalias nocapture readonly %A, %struct.ST2* noalias nocapture %B) #1 {
	; CHECK-LABEL: @test_reversed_load2_store2(			; CHECK-LABEL: @test_reversed_load2_store2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 2044			; CHECK-NEXT: [[INDUCTION:%.*]] = sub <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1023, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), [[TMP0]]
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[IND_END:%.*]] = sub nsw i64 1023, [[N_VEC]]			; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i32 [[TMP1]], -4
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x i32> @llvm.experimental.stepvector.nxv4i32()			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[DOTNEG]], i64 0
	; CHECK-NEXT: [[INDUCTION:%.*]] = sub <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1023, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[DOTNEG4:%.*]] = mul nsw i32 [[TMP2]], -4
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[DOTNEG4]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[DOTSPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[DOTSPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i32> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 1023, [[INDEX]]			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = sub i64 1023, [[INDEX]]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_ST2:%.]], ptr [[A:%.*]], i64 [[OFFSET_IDX]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds [[STRUCT_ST2:%.]], ptr [[A:%.*]], i64 [[OFFSET_IDX]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = call i32 @llvm.vscale.i32()			; CHECK-NEXT: [[TMP3:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i32 [[TMP4]], 3			; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i32 [[TMP3]], 3
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw i32 2, [[TMP5]]			; CHECK-NEXT: [[TMP5:%.*]] = sub nsw i32 2, [[TMP4]]
	; CHECK-NEXT: [[TMP7:%.*]] = sext i32 [[TMP6]] to i64			; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[TMP5]] to i64
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i32, ptr [[TMP3]], i64 [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP2]], i64 [[TMP6]]
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP8]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP7]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP8:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP9]])			; CHECK-NEXT: [[REVERSE:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP8]])
	; CHECK-NEXT: [[TMP10:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1			; CHECK-NEXT: [[TMP9:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[REVERSE1:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP10]])			; CHECK-NEXT: [[REVERSE1:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP9]])
	; CHECK-NEXT: [[TMP11:%.*]] = add nsw <vscale x 4 x i32> [[REVERSE]], [[VEC_IND]]			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <vscale x 4 x i32> [[REVERSE]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP12:%.*]] = sub nsw <vscale x 4 x i32> [[REVERSE1]], [[VEC_IND]]			; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <vscale x 4 x i32> [[REVERSE1]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B:%.]], i64 [[OFFSET_IDX]], i32 1			; CHECK-NEXT: [[TMP12:%.]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B:%.]], i64 [[OFFSET_IDX]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = call i32 @llvm.vscale.i32()			; CHECK-NEXT: [[TMP13:%.*]] = call i32 @llvm.vscale.i32()
	; CHECK-NEXT: [[TMP15:%.*]] = shl nuw nsw i32 [[TMP14]], 3			; CHECK-NEXT: [[TMP14:%.*]] = shl nuw nsw i32 [[TMP13]], 3
	; CHECK-NEXT: [[TMP16:%.*]] = sub nsw i32 1, [[TMP15]]			; CHECK-NEXT: [[TMP15:%.*]] = sub nsw i32 1, [[TMP14]]
	; CHECK-NEXT: [[TMP17:%.*]] = sext i32 [[TMP16]] to i64			; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64
	; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i32, ptr [[TMP13]], i64 [[TMP17]]			; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i32, ptr [[TMP12]], i64 [[TMP16]]
	; CHECK-NEXT: [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP11]])			; CHECK-NEXT: [[REVERSE2:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP10]])
	; CHECK-NEXT: [[REVERSE3:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP12]])			; CHECK-NEXT: [[REVERSE3:%.*]] = call <vscale x 4 x i32> @llvm.experimental.vector.reverse.nxv4i32(<vscale x 4 x i32> [[TMP11]])
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[REVERSE2]], <vscale x 4 x i32> [[REVERSE3]])			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[REVERSE2]], <vscale x 4 x i32> [[REVERSE3]])
	; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP18]], align 4			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP17]], align 4
	; CHECK-NEXT: [[TMP19:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP18:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP20:%.*]] = shl nuw nsw i64 [[TMP19]], 2			; CHECK-NEXT: [[TMP19:%.*]] = shl nuw nsw i64 [[TMP18]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP20]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP19]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i32> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i32> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_COND_CLEANUP:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], !llvm.loop [[LOOP11:![0-9]+]]
	; CHECK-NEXT: [[X:%.*]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[A]], i64 [[INDVARS_IV]], i32 0
	; CHECK-NEXT: [[LOAD1:%.*]] = load i32, ptr [[X]], align 4
	; CHECK-NEXT: [[TRUNC:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[LOAD1]], [[TRUNC]]
	; CHECK-NEXT: [[Y:%.*]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[A]], i64 [[INDVARS_IV]], i32 1
	; CHECK-NEXT: [[LOAD2:%.*]] = load i32, ptr [[Y]], align 4
	; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[LOAD2]], [[TRUNC]]
	; CHECK-NEXT: [[X5:%.*]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B]], i64 [[INDVARS_IV]], i32 0
	; CHECK-NEXT: store i32 [[ADD]], ptr [[X5]], align 4
	; CHECK-NEXT: [[Y8:%.*]] = getelementptr inbounds [[STRUCT_ST2]], ptr [[B]], i64 [[INDVARS_IV]], i32 1
	; CHECK-NEXT: store i32 [[SUB]], ptr [[Y8]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
	; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i64 [[INDVARS_IV]], 0
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], !llvm.loop [[LOOP11:![0-9]+]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body			for.cond.cleanup: ; preds = %for.body
	ret void			ret void

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	Show All 27 Lines

	define void @even_load_static_tc(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) #1 {			define void @even_load_static_tc(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) #1 {
	; CHECK-LABEL: @even_load_static_tc(			; CHECK-LABEL: @even_load_static_tc(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
	; CHECK-NEXT: [[TMP2:%.*]] = add nuw nsw i64 [[TMP1]], 1023			; CHECK-NEXT: [[N_VEC:%.*]] = sub nuw nsw i64 512, [[TMP1]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = and i64 [[TMP2]], 512			; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[N_MOD_VF]], 0
	; CHECK-NEXT: [[TMP4:%.*]] = select i1 [[TMP3]], i64 [[TMP1]], i64 [[N_MOD_VF]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub nsw i64 512, [[TMP4]]
	; CHECK-NEXT: [[IND_END:%.*]] = shl nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP5]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP2]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP6:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = shl nsw <vscale x 4 x i32> [[TMP6]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP4:%.*]] = shl nsw <vscale x 4 x i32> [[TMP3]], shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP8:%.*]] = and i64 [[INDEX]], 9223372036854775804			; CHECK-NEXT: [[TMP5:%.*]] = and i64 [[INDEX]], 9223372036854775804
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP5]]
	; CHECK-NEXT: store <vscale x 4 x i32> [[TMP7]], ptr [[TMP9]], align 4			; CHECK-NEXT: store <vscale x 4 x i32> [[TMP4]], ptr [[TMP6]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP11:%.*]] = shl nuw nsw i64 [[TMP10]], 2			; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP11]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br label [[SCALAR_PH]]			; CHECK-NEXT: br label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; }			; }

	%pair = type { i64, i64 }			%pair = type { i64, i64 }
	define void @load_gap_reverse(%pair* noalias nocapture readonly %P1, %pair* noalias nocapture readonly %P2, i64 %X) #1 {			define void @load_gap_reverse(%pair* noalias nocapture readonly %P1, %pair* noalias nocapture readonly %P2, i64 %X) #1 {
	; CHECK-LABEL: @load_gap_reverse(			; CHECK-LABEL: @load_gap_reverse(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 2044			; CHECK-NEXT: [[INDUCTION:%.*]] = sub <vscale x 4 x i64> shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1023, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer), [[TMP0]]
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024			; CHECK-NEXT: [[TMP1:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[IND_END:%.*]] = sub nsw i64 1023, [[N_VEC]]			; CHECK-NEXT: [[DOTNEG:%.*]] = mul nsw i64 [[TMP1]], -4
	; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()			; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[DOTNEG]], i64 0
	; CHECK-NEXT: [[INDUCTION:%.*]] = sub <vscale x 4 x i64> shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 1023, i64 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer), [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[DOTNEG1:%.*]] = mul nsw i64 [[TMP2]], -4
	; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[DOTNEG1]], i64 0
	; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i64> poison, i64 [[X:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i64> poison, i64 [[X:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_IND:%.]] = phi <vscale x 4 x i64> [ [[INDUCTION]], [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <vscale x 4 x i64> [[BROADCAST_SPLAT]], [[VEC_IND]]			; CHECK-NEXT: [[TMP2:%.*]] = add nsw <vscale x 4 x i64> [[BROADCAST_SPLAT]], [[VEC_IND]]
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[PAIR:%.]], ptr [[P1:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[PAIR:%.]], ptr [[P1:%.*]], <vscale x 4 x i64> [[VEC_IND]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr inbounds [[PAIR]], ptr [[P2:%.]], <vscale x 4 x i64> [[VEC_IND]], i32 1			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[PAIR]], ptr [[P2:%.]], <vscale x 4 x i64> [[VEC_IND]], i32 1
	; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i64> @llvm.masked.gather.nxv4i64.nxv4p0(<vscale x 4 x ptr> [[TMP5]], i32 8, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i64> poison)			; CHECK-NEXT: [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 4 x i64> @llvm.masked.gather.nxv4i64.nxv4p0(<vscale x 4 x ptr> [[TMP4]], i32 8, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x i64> poison)
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <vscale x 4 x i64> [[WIDE_MASKED_GATHER]], [[VEC_IND]]			; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <vscale x 4 x i64> [[WIDE_MASKED_GATHER]], [[VEC_IND]]
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i64.nxv4p0(<vscale x 4 x i64> [[TMP3]], <vscale x 4 x ptr> [[TMP4]], i32 8, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i64.nxv4p0(<vscale x 4 x i64> [[TMP2]], <vscale x 4 x ptr> [[TMP3]], i32 8, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i64.nxv4p0(<vscale x 4 x i64> [[TMP6]], <vscale x 4 x ptr> [[TMP5]], i32 8, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))			; CHECK-NEXT: call void @llvm.masked.scatter.nxv4i64.nxv4p0(<vscale x 4 x i64> [[TMP5]], <vscale x 4 x ptr> [[TMP4]], i32 8, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer))
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2			; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP7]]
	; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]			; CHECK-NEXT: [[VEC_IND_NEXT]] = add <vscale x 4 x i64> [[VEC_IND]], [[DOTSPLAT]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP16:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_EXIT:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_EXIT:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[I_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_EXIT]], !llvm.loop [[LOOP17:![0-9]+]]
	; CHECK-NEXT: [[I_NEXT]] = add nsw i64 [[I]], -1
	; CHECK-NEXT: [[COND:%.*]] = icmp sgt i64 [[I]], 0
	; CHECK-NEXT: br i1 [[COND]], label [[FOR_BODY]], label [[FOR_EXIT]], !llvm.loop [[LOOP17:![0-9]+]]
	; CHECK: for.exit:			; CHECK: for.exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 1023, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 1023, %entry ], [ %i.next, %for.body ]
	Show All 22 Lines
	; }			; }


	define void @mixed_load2_store2(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) #1 {			define void @mixed_load2_store2(i32* noalias nocapture readonly %A, i32* noalias nocapture %B) #1 {
	; CHECK-LABEL: @mixed_load2_store2(			; CHECK-LABEL: @mixed_load2_store2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 1020
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 512
	; CHECK-NEXT: [[IND_END:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1			; CHECK-NEXT: [[OFFSET_IDX:%.*]] = shl i64 [[INDEX]], 1
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds i32, ptr [[A:%.]], i64 [[OFFSET_IDX]]
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP0]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP1:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1			; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[OFFSET_IDX]], 1			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[OFFSET_IDX]], 1
	; CHECK-NEXT: [[TMP5:%.*]] = mul nsw <vscale x 4 x i32> [[TMP3]], [[TMP2]]			; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <vscale x 4 x i32> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC2:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP6:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 0			; CHECK-NEXT: [[TMP5:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 1			; CHECK-NEXT: [[TMP6:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC2]], 1
	; CHECK-NEXT: [[TMP8:%.*]] = add nsw <vscale x 4 x i32> [[TMP7]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = add nsw <vscale x 4 x i32> [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, ptr [[B:%.]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i32, ptr [[TMP9]], i64 -1			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i32, ptr [[TMP8]], i64 -1
	; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP5]], <vscale x 4 x i32> [[TMP8]])			; CHECK-NEXT: [[INTERLEAVED_VEC:%.*]] = call <vscale x 8 x i32> @llvm.experimental.vector.interleave2.nxv8i32(<vscale x 4 x i32> [[TMP4]], <vscale x 4 x i32> [[TMP7]])
	; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP10]], align 4			; CHECK-NEXT: store <vscale x 8 x i32> [[INTERLEAVED_VEC]], ptr [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP10:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP12:%.*]] = shl nuw nsw i64 [[TMP11]], 2			; CHECK-NEXT: [[TMP11:%.*]] = shl nuw nsw i64 [[TMP10]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP12]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP12:%.*]] = icmp eq i64 [[INDEX_NEXT]], 512
	; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP12]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP18:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_COND_CLEANUP:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], !llvm.loop [[LOOP19:![0-9]+]]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[LOAD1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[OR:%.*]] = or i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[OR]]
	; CHECK-NEXT: [[LOAD2:%.*]] = load i32, ptr [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[LOAD2]], [[LOAD1]]
	; CHECK-NEXT: [[ARRAYIDX4:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store i32 [[MUL]], ptr [[ARRAYIDX4]], align 4
	; CHECK-NEXT: [[LOAD3:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[ADD10:%.*]] = add nsw i32 [[LOAD2]], [[LOAD3]]
	; CHECK-NEXT: [[ARRAYIDX13:%.*]] = getelementptr inbounds i32, ptr [[B]], i64 [[OR]]
	; CHECK-NEXT: store i32 [[ADD10]], ptr [[ARRAYIDX13]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i64 [[INDVARS_IV]], 1022
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP]], !llvm.loop [[LOOP19:![0-9]+]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body			for.cond.cleanup: ; preds = %for.body
	ret void			ret void

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	@SA = common global i32 0, align 4			@SA = common global i32 0, align 4
	@SB = common global float 0.000000e+00, align 4			@SB = common global float 0.000000e+00, align 4

	define void @int_float_struct(%struct.IntFloat* nocapture readonly %p) #0 {			define void @int_float_struct(%struct.IntFloat* nocapture readonly %p) #0 {
	; CHECK-LABEL: @int_float_struct(			; CHECK-LABEL: @int_float_struct(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 2044
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x float> [ insertelement (<vscale x 4 x float> zeroinitializer, float undef, i32 0), [[VECTOR_PH]] ], [ [[TMP6:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI:%.]] = phi <vscale x 4 x float> [ insertelement (<vscale x 4 x float> zeroinitializer, float undef, i32 0), [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 4 x i32> [ insertelement (<vscale x 4 x i32> zeroinitializer, i32 undef, i32 0), [[VECTOR_PH]] ], [ [[TMP5:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[VEC_PHI1:%.]] = phi <vscale x 4 x i32> [ insertelement (<vscale x 4 x i32> zeroinitializer, i32 undef, i32 0), [[VECTOR_PH]] ], [ [[TMP4:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_INTFLOAT:%.]], ptr [[P:%.*]], i64 [[INDEX]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_INTFLOAT:%.]], ptr [[P:%.*]], i64 [[INDEX]], i32 0
	; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP1]], align 4			; CHECK-NEXT: [[WIDE_VEC:%.*]] = load <vscale x 8 x i32>, ptr [[TMP0]], align 4
	; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])			; CHECK-NEXT: [[STRIDED_VEC:%.*]] = call { <vscale x 4 x i32>, <vscale x 4 x i32> } @llvm.experimental.vector.deinterleave2.nxv8i32(<vscale x 8 x i32> [[WIDE_VEC]])
	; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0			; CHECK-NEXT: [[TMP1:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1			; CHECK-NEXT: [[TMP2:%.*]] = extractvalue { <vscale x 4 x i32>, <vscale x 4 x i32> } [[STRIDED_VEC]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = bitcast <vscale x 4 x i32> [[TMP3]] to <vscale x 4 x float>			; CHECK-NEXT: [[TMP3:%.*]] = bitcast <vscale x 4 x i32> [[TMP2]] to <vscale x 4 x float>
	; CHECK-NEXT: [[TMP5]] = add <vscale x 4 x i32> [[TMP2]], [[VEC_PHI1]]			; CHECK-NEXT: [[TMP4]] = add <vscale x 4 x i32> [[TMP1]], [[VEC_PHI1]]
	; CHECK-NEXT: [[TMP6]] = fadd fast <vscale x 4 x float> [[VEC_PHI]], [[TMP4]]			; CHECK-NEXT: [[TMP5]] = fadd fast <vscale x 4 x float> [[VEC_PHI]], [[TMP3]]
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP8:%.*]] = shl nuw nsw i64 [[TMP7]], 2			; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 2
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> [[TMP4]])
	; CHECK-NEXT: [[TMP11:%.*]] = call fast float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP6]])			; CHECK-NEXT: [[TMP10:%.*]] = call fast float @llvm.vector.reduce.fadd.nxv4f32(float -0.000000e+00, <vscale x 4 x float> [[TMP5]])
	; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0			; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
	; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_COND_CLEANUP:%.*]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi float [ [[TMP11]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY]] ]
	; CHECK-NEXT: [[BC_MERGE_RDX2:%.*]] = phi i32 [ [[TMP10]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: [[ADD_LCSSA:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ADD_LCSSA:%.*]] = phi i32 [ poison, [[FOR_BODY]] ], [ [[TMP9]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[ADD3_LCSSA:%.]] = phi float [ [[ADD3:%.]], [[FOR_BODY]] ], [ [[TMP11]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[ADD3_LCSSA:%.*]] = phi float [ poison, [[FOR_BODY]] ], [ [[TMP10]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: store i32 [[ADD_LCSSA]], ptr @SA, align 4			; CHECK-NEXT: store i32 [[ADD_LCSSA]], ptr @SA, align 4
	; CHECK-NEXT: store float [[ADD3_LCSSA]], ptr @SB, align 4			; CHECK-NEXT: store float [[ADD3_LCSSA]], ptr @SB, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: br i1 poison, label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	; CHECK-NEXT: [[SUMB_014:%.*]] = phi float [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ADD3]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[SUMA_013:%.*]] = phi i32 [ [[BC_MERGE_RDX2]], [[SCALAR_PH]] ], [ [[ADD]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[A:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], i64 [[INDVARS_IV]], i32 0
	; CHECK-NEXT: [[LOAD1:%.*]] = load i32, ptr [[A]], align 4
	; CHECK-NEXT: [[ADD]] = add nsw i32 [[LOAD1]], [[SUMA_013]]
	; CHECK-NEXT: [[B:%.*]] = getelementptr inbounds [[STRUCT_INTFLOAT]], ptr [[P]], i64 [[INDVARS_IV]], i32 1
	; CHECK-NEXT: [[LOAD2:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[ADD3]] = fadd fast float [[SUMB_014]], [[LOAD2]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 1024
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP21:![0-9]+]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.body			for.cond.cleanup: ; preds = %for.body
	store i32 %add, i32* @SA, align 4			store i32 %add, i32* @SA, align 4
	store float %add3, float* @SB, align 4			store float %add3, float* @SB, align 4
	ret void			ret void
	▲ Show 20 Lines • Show All 732 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	for.end:
ret i32 %var5		ret i32 %var5
}		}

define void @phi_used_in_vector_compare_and_scalar_indvar_update_and_store(ptr %ptr) #0 {		define void @phi_used_in_vector_compare_and_scalar_indvar_update_and_store(ptr %ptr) #0 {
; CHECK-LABEL: @phi_used_in_vector_compare_and_scalar_indvar_update_and_store(		; CHECK-LABEL: @phi_used_in_vector_compare_and_scalar_indvar_update_and_store(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[DOTNEG:%.*]] = mul nuw nsw i64 [[TMP0]], 2046
; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[DOTNEG]], 1024
; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[N_VEC]], 1
; CHECK-NEXT: [[IND_END:%.]] = getelementptr i8, ptr [[PTR:%.]], i64 [[TMP1]]
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[POINTER_PHI:%.]] = phi ptr [ [[PTR]], [[VECTOR_PH]] ], [ [[PTR_IND:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[POINTER_PHI:%.]] = phi ptr [ [[PTR:%.]], [[VECTOR_PH]] ], [ [[PTR_IND:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i64 [[TMP2]], 2		; CHECK-NEXT: [[TMP1:%.*]] = shl nuw nsw i64 [[TMP0]], 2
; CHECK-NEXT: [[TMP4:%.*]] = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()		; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
; CHECK-NEXT: [[VECTOR_GEP:%.*]] = shl <vscale x 2 x i64> [[TMP4]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)		; CHECK-NEXT: [[VECTOR_GEP:%.*]] = shl <vscale x 2 x i64> [[TMP2]], shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 1, i64 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, ptr [[POINTER_PHI]], <vscale x 2 x i64> [[VECTOR_GEP]]		; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i8, ptr [[POINTER_PHI]], <vscale x 2 x i64> [[VECTOR_GEP]]
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <vscale x 2 x ptr> [[TMP5]], zeroinitializer		; CHECK-NEXT: [[TMP4:%.*]] = icmp ne <vscale x 2 x ptr> [[TMP3]], zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <vscale x 2 x ptr> [[TMP5]], i64 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <vscale x 2 x ptr> [[TMP3]], i64 0
; CHECK-NEXT: call void @llvm.masked.store.nxv2i16.p0(<vscale x 2 x i16> zeroinitializer, ptr [[TMP7]], i32 2, <vscale x 2 x i1> [[TMP6]])		; CHECK-NEXT: call void @llvm.masked.store.nxv2i16.p0(<vscale x 2 x i16> zeroinitializer, ptr [[TMP5]], i32 2, <vscale x 2 x i1> [[TMP4]])
; CHECK-NEXT: [[TMP8:%.*]] = call i64 @llvm.vscale.i64()		; CHECK-NEXT: [[TMP6:%.*]] = call i64 @llvm.vscale.i64()
; CHECK-NEXT: [[TMP9:%.*]] = shl nuw nsw i64 [[TMP8]], 1		; CHECK-NEXT: [[TMP7:%.*]] = shl nuw nsw i64 [[TMP6]], 1
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP9]]		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP7]]
; CHECK-NEXT: [[PTR_IND]] = getelementptr i8, ptr [[POINTER_PHI]], i64 [[TMP3]]		; CHECK-NEXT: [[PTR_IND]] = getelementptr i8, ptr [[POINTER_PHI]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]		; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
; CHECK-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]		; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP9:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N_NOT:%.*]] = icmp eq i64 [[N_VEC]], 0		; CHECK-NEXT: br i1 true, label [[FOR_END:%.*]], label [[SCALAR_PH]]
; CHECK-NEXT: br i1 [[CMP_N_NOT]], label [[SCALAR_PH]], label [[FOR_END:%.*]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY:%.]] ]
; CHECK-NEXT: [[BC_RESUME_VAL1:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ poison, [[ENTRY]] ]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]		; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:		; CHECK: for.body:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[INC:%.]], [[IF_END:%.*]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]		; CHECK-NEXT: br i1 poison, label [[IF_END_SINK_SPLIT:%.]], label [[IF_END:%.]]
; CHECK-NEXT: [[IV_PTR:%.]] = phi ptr [ [[INCDEC_IV_PTR:%.]], [[IF_END]] ], [ [[BC_RESUME_VAL1]], [[SCALAR_PH]] ]
; CHECK-NEXT: [[CMP_I_NOT:%.*]] = icmp eq ptr [[IV_PTR]], null
; CHECK-NEXT: br i1 [[CMP_I_NOT]], label [[IF_END]], label [[IF_END_SINK_SPLIT:%.*]]
; CHECK: if.end.sink.split:		; CHECK: if.end.sink.split:
; CHECK-NEXT: store i16 0, ptr [[IV_PTR]], align 2
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[INCDEC_IV_PTR]] = getelementptr inbounds i16, ptr [[IV_PTR]], i64 1		; CHECK-NEXT: br i1 poison, label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK-NEXT: [[INC]] = add nuw nsw i64 [[IV]], 1
; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp ult i64 [[IV]], 1023
; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_BODY]], label [[FOR_END]], !llvm.loop [[LOOP10:![0-9]+]]
; CHECK: for.end:		; CHECK: for.end:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %if.end, %entry		for.body: ; preds = %if.end, %entry
%iv = phi i64 [ %inc, %if.end ], [ 0, %entry ]		%iv = phi i64 [ %inc, %if.end ], [ 0, %entry ]
Show All 30 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] Remove the remainder loop if we know the mask is always trueClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 546049

llvm/lib/Analysis/InstructionSimplify.cpp

llvm/test/Transforms/InstCombine/rem-mul-shl.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll

[InstSimplify] Remove the remainder loop if we know the mask is always true
ClosedPublic