This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64Subtarget.h
6/14
AArch64Subtarget.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
sve-fixed-length-masked-stores.ll
-
sve-masked-load-store.ll

Differential D133433

[AArch64]: Force generating code compatible to streaming mode
ClosedPublic

Authored by hassnaa-arm on Sep 7 2022, 9:40 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
sdesmalen
david-arm
kmclaughlin

Commits

rG2c72d90ecc69: [AArch64-SVE]: Force generating code compatible to streaming mode.

Summary

Add a compile-time flag for enabling streaming mode.
When streaming mode is enabled, lower basic loads and stores of fixed-width vectors;
to generate code that is compatible to streaming mode.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hassnaa-arm created this revision.Sep 7 2022, 9:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 9:40 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

hassnaa-arm requested review of this revision.Sep 7 2022, 9:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 9:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B185433: Diff 458482.Sep 7 2022, 10:29 AM

Matt added a subscriber: Matt.Sep 8 2022, 12:11 AM

hassnaa-arm added a reviewer: david-arm.Sep 8 2022, 3:15 AM

paulwalker-arm added inline comments.Sep 8 2022, 3:18 AM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
436	This is going to enable a lot more code paths than it currently tested. Can you explain the rational for the new flag? Does it relate to SME's streaming-compatible mode? or it is wanted for other reasons?

Adding store test cases to sve-fixed-length-masked-stores.ll

Harbormaster completed remote builds in B185604: Diff 458713.Sep 8 2022, 7:17 AM

Rename force-sve-128bit-vector to force-sve-when-streaming-compatible
Renamed that flag because it's related to streaming-mode,
because during streaming-mode we can't use NEON, so we foce using SVE.

Add RUN line with flag of --force-sve-when-streaming-compatible to sve-fixed-length-masked-loads.ll

Harbormaster completed remote builds in B185621: Diff 458737.Sep 8 2022, 8:25 AM

Matt added inline comments.Sep 8 2022, 4:21 PM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
436	This is going to enable a lot more code paths than it currently tested. Out of curiosity, I've run a quick check for a potentially related issue, https://github.com/llvm/llvm-project/issues/56412 I'm no longer encountering the ICE when compiling "sve-fixed-length-masked-gather.ll" with this option enabled, as in `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll`. At the same time, there's no impact on the generated code (identical assembly); in any case, that's strictly better than ICE. @hassnaa-arm, I'm wondering, just to be on the safe side, could you possibly run a quick check on your end to make sure that you're not encountering any issues with "sve-fixed-length-masked-gather.ll", either? That, and perhaps even add a RUN line with `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128` to that file (as it has caused an ICE with 128-bit SVE compilation before), if that would be no trouble? @paulwalker-arm, If this patch gets accepted and the above test works fine perhaps that would offer a way to close https://github.com/llvm/llvm-project/issues/56412?

Matt added inline comments.Sep 8 2022, 4:42 PM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

436

Update: I've tested on the whole file and the ICE does appear, after all.
The difference is that now the affected function is masked_gather_v8f16 (whereas previously compiling masked_gather_v2f16 alone was sufficient to trigger the ICE--now it no longer does).

After compiling with llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll:

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll
1.      Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.ll'.
2.      Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v8f16'
  #0 0x00007f1e7892db26 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3
  #1 0x00007f1e7892b9ad llvm::sys::RunSignalHandlers() /path/to/llvm-project/llvm/lib/Support/Signals.cpp:103:20
  #2 0x00007f1e7892bb2c SignalHandler(int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
  #3 0x00007f1e77a42210 __restore_rt (/lib64/libc.so.6+0x3a210)
  #4 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
  #5 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
  #6 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
  #7 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
  #8 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
  #9 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
 #10 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
 #11 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32
 #12 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode*) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33
 #13 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14
 #14 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36
. . .

(the remaining part similarly recurring as in the aforementioned GitHub issue).

@Matt: This work is orthogonal to https://github.com/llvm/llvm-project/issues/56412. When in streaming mode gather/scatter instructions are not available so we'll have code to mark the associated intrinsics as illegal and thus they'll be scalarised before reaching code gen. This doesn't take away the importance of the GitHub issue, which will be resolved when we specifically enable gather/scatter code generation for 128-bit vectors.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent.

@paulwalker-arm: I see, thanks!

Force using SVE in streaming mode.

hassnaa-arm added a reviewer: kmclaughlin.Sep 29 2022, 8:55 AM

Harbormaster completed remote builds in B189411: Diff 463893.Sep 29 2022, 9:22 AM

Force SVE in Streaming Mode for all types of load/store

Rename new load/store files

Harbormaster completed remote builds in B190002: Diff 464714.Oct 3 2022, 10:20 AM

david-arm added inline comments.Oct 4 2022, 7:41 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

1397 ↗

(On Diff #464714)

I think at some point we probably want to combine this with the code below:

if (Subtarget->useSVEForFixedLengthVectors()) {
  for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
    if (useSVEForFixedLengthVectorVT(VT))
      addTypeForFixedLengthSVE(VT);

The problem is that addTypeForFixedLengthSVE will add a whole bunch of opcodes all at once, which we're probably not ready for.

@hassnaa-arm perhaps you can simplify this to something like:

if (Subtarget->forceSVEInStreamingMode()) {
  for (MVT VT : MVT::integer_fixedlen_vector_valuetypes())
    if (useSVEForFixedLengthVectorVT(VT, true)
      addTypeForStreamingSVE(VT);
  for (MVT VT : MVT::fp_fixedlen_vector_valuetypes())
     if (useSVEForFixedLengthVectorVT(VT, true)
       addTypeForStreamingSVE(VT);
}

where you add a function called addTypeForStreamingSVE a bit similar to addTypeForFixedLengthSVE. For now it would just be:

void addTypeForStreamingSVE(EVT VT) {
  setOperationAction(ISD::ANY_EXTEND, VT, Custom);
  setOperationAction(ISD::ZERO_EXTEND, VT, Custom);
  setOperationAction(ISD::SIGN_EXTEND, VT, Custom);
  setOperationAction(ISD::LOAD, VT, Custom);
  setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
}

What do you think?

kmclaughlin added inline comments.Oct 4 2022, 9:33 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1397 ↗	(On Diff #464714)	I also think it would be good to try and simplify this, and for now it might also be worth adding a `TODO` to explain that these functions will be combined once all of the opcodes have been covered?
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
1893	nit: extra whitespace :)
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-ext-loads.ll
8 ↗	(On Diff #464714)	The name of this test doesn't look quite right - I think for this one it should be something like `@load_zext_v8i8i16`, the next one should be `@load_zext_v4i16i32`, etc. I could be wrong, I am just comparing to some of the existing tests we have in `sve-fixed-length-ext-loads.ll`.
318 ↗	(On Diff #464714)	Is it worth adding tests where the type being extended from is also illegal? Something like this: %a = load <16 x i16>, <16 x i16>* %ap %val = sext <16 x i16> %a to <16 x i64> ret <16 x i64> %val
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-loads.ll
17 ↗	(On Diff #464714)	There don't seem to be any check lines for any of the `VBITS_GE_*` labels added here? Maybe if they are not needed you could remove the extra labels, or add some check lines to match the ones you need. I think fixing this will also remove the note added at the bottom of this test.
76 ↗	(On Diff #464714)	Can you please add a test using a load of type which is illegal for Neon, e.g. `32 x float`?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-masked-store.ll
4 ↗	(On Diff #464714)	As with the fixed-length-loads.ll test, I think removing the unused labels here (and in the other test files below) or adding check lines for them will remove the warnings added by the test script.
104 ↗	(On Diff #464714)	Can you please also add a test with an illegal Neon type?

Hi @hassnaa-arm, I've not had a chance to fully review this yet, but could you rename the title to something like

[AArch64][SVE]: Lower all types of load/store of 128-bit fixed-width vector using SVE

? This patch is now lowering more than just masked loads/stores.

david-arm mentioned this in D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 6 2022, 1:10 AM

set custom action only in case of streaming mode
add some illegal NEON tests

hassnaa-arm added a child revision: D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 6 2022, 3:57 AM

hassnaa-arm retitled this revision from [AArch64-SVE]: lower masked load/store of 128-bit fixed-width vectors to [AArch64-SVE]: lower all types of loads and stores of fixed-width vector.Oct 6 2022, 4:02 AM

Harbormaster completed remote builds in B190695: Diff 465685.Oct 6 2022, 4:43 AM

sdesmalen added inline comments.Oct 6 2022, 8:39 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1398 ↗	(On Diff #465685)	nit: unnecessary curly braces here and below.
12404 ↗	(On Diff #465685)	Are you doing these as part of the same patch, because tests like `sve-fixed-length-int-shifts.ll` require loads and stores to work? I wonder if you can still split out the unpredicated, non-extending/truncating loads/stores from this patch, such that you can have: patch for basic loads/stores + anything required to make these work (including tests) patches for shifts, concat_vectors, build_vector, vector_shuffle, extract_vector_elt, extract_subvector (or at least as many that are not required for (1)), with corresponding tests (e.g. `sve-fixed-length-int-shifts.ll` -> `sve-streaming-compatible-fixed-length-int-shifts.ll`), because these tests are currently missing. patch for masked and extending/truncating loads and stores.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
3036 ↗	(On Diff #465685)	I think you can just write: let AddedComplexity = 1, Predicates = [IsForcingSVEDisabled] in { ... } which would avoid the extra indentation.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	Can you rename this variable to `ForceStreamingCompatibleSVE`? (likewise change the name of the flag to `-force-streaming-compatible-sve`) The current name `ForceSVEWhenStreamingCompatible` suggests to use the full range of SVE instructions when in streaming-compatible mode, even the instructions that would be illegal in that mode, but that would be incorrect.

hassnaa-arm marked 2 inline comments as done.Oct 7 2022, 3:07 AM

hassnaa-arm added inline comments.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	so there are some SVE instruction that are illegal in streaming mode ? like what ? because I was checking only for NEON illegal instructions, not SVE illegal instruction.

Split out the patch. this patch now has only related work to basic load and store.

Remove commented lines

hassnaa-arm retitled this revision from [AArch64-SVE]: lower all types of loads and stores of fixed-width vector to [AArch64-SVE]: Force using streaming compatible mode.Oct 7 2022, 3:06 PM

hassnaa-arm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B191031: Diff 466195.Oct 7 2022, 3:56 PM

hassnaa-arm retitled this revision from [AArch64-SVE]: Force using streaming compatible mode to [AArch64]: Force generating code compatible to streaming mode.Oct 9 2022, 3:18 PM

hassnaa-arm edited the summary of this revision. (Show Details)

Rename flag of force-streaming-compatible-mode to force-streaming-mode-compatible-sve

hassnaa-arm added a child revision: D135564: [AArch64-SVE]: Force generating code compatible to streaming mode..Oct 10 2022, 2:08 AM

Harbormaster completed remote builds in B191234: Diff 466452.Oct 10 2022, 2:36 AM

Thanks @hassnaa-arm! I just left a few more nits mostly around the names, but other than that it looks really good!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12027 ↗	(On Diff #466452)	nit: comment can be removed?
12391 ↗	(On Diff #466452)	nit: comment can be removed?
12404 ↗	(On Diff #466452)	nit: comment can be removed?
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
69	Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not valid in Streaming Mode.
69	nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove `-mode` from the name, i.e. `force-streaming-mode-compatible-sve -> force-streaming-compatible-sve`. Same request for the name of the variable, i.e. `ForceStreamingModeCompatibleSVE -> ForceStreamingCompatibleSVE`
437	Should this return `true` always and instead have `assert(hasSVE() && "Expected SVE to be available")` ? If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set `+sve` somehow)
443	nit: `forceStreamingCompatibleSVE` (see other comment above)

paulwalker-arm added inline comments.Oct 10 2022, 11:15 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1398 ↗	(On Diff #466452)	Unlike `useSVEForFixedLengthVectors()` mode where the MVTs are not know ahead of time, the use case for `forceStreamingModeCompatibleSVE()` mode is specific to 128-bit and 64-bit vectors and so you shouldn't need to iterate across all vector types.
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
3–17 ↗	(On Diff #466452)	You shouldn't need to test all these combinations. It should be sufficient to test without any `-aarch64-sve-vector-bits-min=` options as that's the expected use case. For the tests themselves you want to add some that use 256bit vectors to verify we don't emit neon instructions as part of type legalisation.

hassnaa-arm marked 8 inline comments as done.Oct 11 2022, 3:53 AM

set operation action as custom only for 128-bit and 64-bit vectors instead of all types

Thanks for the changes to this patch. This one looks good to me now!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
12028 ↗	(On Diff #466762)	nit: it seems you missed this one (can be removed)

This revision is now accepted and ready to land.Oct 11 2022, 4:01 AM

Harbormaster completed remote builds in B191469: Diff 466762.Oct 11 2022, 4:38 AM

paulwalker-arm added inline comments.Oct 11 2022, 7:39 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1398 ↗	(On Diff #466762)	I don't see tests for this and some of the other MVTs. Do we need explicit handling for `MVT::v1i64` and `MVT::v1f64`? I would have thought these would just emit a scalar access, although there's no tests to show this.
1400 ↗	(On Diff #466762)	Is this check (plus the one for the float loop) necessary? I would expect that when `forceStreamingCompatibleSVE()` returns true we have no choice but to enable the custom lowering.
llvm/lib/Target/AArch64/AArch64Subtarget.cpp
436–439	if (forceStreamingCompatibleSVE()) return true; Doing this might mean `useSVEForFixedLengthVectors` can remain defined in the header file.
448	Should this also include `\|\| hasSME()`?
llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-stores.ll
26 ↗	(On Diff #466762)	The patch summary mentions lowering stores and you've added this test but I don't see any code to enable such lowering and hence we are seeing NEON str instructions.

hassnaa-arm removed a child revision: D135324: [AArch64-SVE]: force using SVE in streaming mode to lower arithmetic and logical fixed-width vector ops..Oct 11 2022, 10:17 PM

Updated by parent patch

Add additional test cases

hassnaa-arm marked an inline comment as not done.Oct 13 2022, 5:36 AM

Harbormaster completed remote builds in B191951: Diff 467454.Oct 13 2022, 7:56 AM

Remove custom-lowering ISD::load.
For fixed-length load/store, no need for custom-lowering ISD::load.
It was added I thought that ldr is illegal in streaming mode, but it is legal, so no need for custom-lowering ISD::load now.

Harbormaster completed remote builds in B192003: Diff 467531.Oct 13 2022, 11:53 AM

paulwalker-arm added inline comments.Oct 13 2022, 4:21 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1397–1405 ↗	(On Diff #467531)	I understand why this code block and `addTypeForStreamingSVE()` exist, but given they don't do anything within this patch anymore I think they're best moved into one of your other patches.
5775–5776 ↗	(On Diff #467531)	Is this still necessary now that you're no longer custom lowering `ISD::LOAD` for NEON sized vectors?
llvm/lib/Target/AArch64/AArch64InstrInfo.td
7138 ↗	(On Diff #467531)	Up to you but personally I think `NotInStreamingSVEMode` reads better.

Remove 'addTypeForStreamingSVE()' as it's not needed now.
Now, I don't use set Custom operation action for any node, so not need for addTypeForStreamingSVE().

Harbormaster completed remote builds in B192160: Diff 467745.Oct 14 2022, 5:41 AM

hassnaa-arm marked 2 inline comments as done.Oct 14 2022, 8:16 AM

As discussed offline there's more we can do here to improve code quality but this'll come as you increase ISD node coverage. I've one minor issue but otherwise this looks good to me.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
437	I guess this should match the code below, although I'm not quite sure why the assert it needed.

hassnaa-arm added inline comments.Oct 14 2022, 8:43 AM

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
437	If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow)

This revision was landed with ongoing or failed builds.Oct 14 2022, 10:47 AM

Closed by commit rG2c72d90ecc69: [AArch64-SVE]: Force generating code compatible to streaming mode. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

hassnaa-arm added a commit: rG2c72d90ecc69: [AArch64-SVE]: Force generating code compatible to streaming mode..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64Subtarget.h

5 lines

AArch64Subtarget.cpp

11 lines

test/

CodeGen/

AArch64/

sve-fixed-length-masked-stores.ll

1473 lines

sve-masked-load-store.ll

108 lines

Diff 458713

llvm/lib/Target/AArch64/AArch64Subtarget.h

Show First 20 Lines • Show All 350 Lines • ▼ Show 20 Lines	unsigned getMaxSVEVectorSizeInBits() const {
return MaxSVEVectorSizeInBits;		return MaxSVEVectorSizeInBits;
}		}

unsigned getMinSVEVectorSizeInBits() const {		unsigned getMinSVEVectorSizeInBits() const {
assert(HasSVE && "Tried to get SVE vector length without SVE support!");		assert(HasSVE && "Tried to get SVE vector length without SVE support!");
return MinSVEVectorSizeInBits;		return MinSVEVectorSizeInBits;
}		}

bool useSVEForFixedLengthVectors() const {		bool useSVEForFixedLengthVectors() const;
// Prefer NEON unless larger SVE registers are available.
return hasSVE() && getMinSVEVectorSizeInBits() >= 256;
}

unsigned getVScaleForTuning() const { return VScaleForTuning; }		unsigned getVScaleForTuning() const { return VScaleForTuning; }

const char* getChkStkName() const {		const char* getChkStkName() const {
if (isWindowsArm64EC())		if (isWindowsArm64EC())
return "__chkstk_arm64ec";		return "__chkstk_arm64ec";
return "__chkstk";		return "__chkstk";
}		}
Show All 11 Lines

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
// allocator, but can still be used as ABI requests, such as passing arguments		// allocator, but can still be used as ABI requests, such as passing arguments
// to function call.		// to function call.
static cl::list<std::string>		static cl::list<std::string>
ReservedRegsForRA("reserve-regs-for-regalloc", cl::desc("Reserve physical "		ReservedRegsForRA("reserve-regs-for-regalloc", cl::desc("Reserve physical "
"registers, so they can't be used by register allocator. "		"registers, so they can't be used by register allocator. "
"Should only be used for testing register allocator."),		"Should only be used for testing register allocator."),
cl::CommaSeparated, cl::Hidden);		cl::CommaSeparated, cl::Hidden);

		static cl::opt<bool> ForceSVEFor128bitVectors("force-sve-128bit-vector",
		cl::init(false), cl::Hidden);
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent. paulwalker-arm: Can this be "force-streaming-compatible-mode"? which I believe better reflects your intent.
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can you rename this variable to `ForceStreamingCompatibleSVE`? (likewise change the name of the flag to `-force-streaming-compatible-sve`) The current name `ForceSVEWhenStreamingCompatible` suggests to use the full range of SVE instructions when in streaming-compatible mode, even the instructions that would be illegal in that mode, but that would be incorrect. sdesmalen: Can you rename this variable to `ForceStreamingCompatibleSVE`? (likewise change the name of the…
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions so there are some SVE instruction that are illegal in streaming mode ? like what ? because I was checking only for NEON illegal instructions, not SVE illegal instruction. hassnaa-arm: so there are some SVE instruction that are illegal in streaming mode ? like what ? because I…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not valid in Streaming Mode. sdesmalen: Streaming SVE is a subset of SVE, in that some SVE instructions (e.g. gather/scatter) are not…
		sdesmalenUnsubmitted Done Reply Inline Actions nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove `-mode` from the name, i.e. `force-streaming-mode-compatible-sve -> force-streaming-compatible-sve`. Same request for the name of the variable, i.e. `ForceStreamingModeCompatibleSVE -> ForceStreamingCompatibleSVE` sdesmalen: nit: I think that 'mode' is kind of implied from 'streaming compatible', so you remove `-mode`…

unsigned AArch64Subtarget::getVectorInsertExtractBaseCost() const {		unsigned AArch64Subtarget::getVectorInsertExtractBaseCost() const {
if (OverrideVectorInsertExtractBaseCost.getNumOccurrences() > 0)		if (OverrideVectorInsertExtractBaseCost.getNumOccurrences() > 0)
return OverrideVectorInsertExtractBaseCost;		return OverrideVectorInsertExtractBaseCost;
return VectorInsertExtractBaseCost;		return VectorInsertExtractBaseCost;
}		}

AArch64Subtarget &AArch64Subtarget::initializeSubtargetDependencies(		AArch64Subtarget &AArch64Subtarget::initializeSubtargetDependencies(
StringRef FS, StringRef CPUString, StringRef TuneCPUString) {		StringRef FS, StringRef CPUString, StringRef TuneCPUString) {
▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	void AArch64Subtarget::mirFileLoaded(MachineFunction &MF) const {
// bogus values after PEI has eliminated the callframe setup/destroy pseudo		// bogus values after PEI has eliminated the callframe setup/destroy pseudo
// instructions, specify explicitly if you need it to be correct.		// instructions, specify explicitly if you need it to be correct.
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
if (!MFI.isMaxCallFrameSizeComputed())		if (!MFI.isMaxCallFrameSizeComputed())
MFI.computeMaxCallFrameSize(MF);		MFI.computeMaxCallFrameSize(MF);
}		}

bool AArch64Subtarget::useAA() const { return UseAA; }		bool AArch64Subtarget::useAA() const { return UseAA; }

		bool AArch64Subtarget::useSVEForFixedLengthVectors() const {
		if (ForceSVEFor128bitVectors)
		paulwalker-armUnsubmitted Not Done Reply Inline Actions This is going to enable a lot more code paths than it currently tested. Can you explain the rational for the new flag? Does it relate to SME's streaming-compatible mode? or it is wanted for other reasons? paulwalker-arm: This is going to enable a lot more code paths than it currently tested. Can you explain the…
		MattUnsubmitted Not Done Reply Inline Actions This is going to enable a lot more code paths than it currently tested. Out of curiosity, I've run a quick check for a potentially related issue, https://github.com/llvm/llvm-project/issues/56412 I'm no longer encountering the ICE when compiling "sve-fixed-length-masked-gather.ll" with this option enabled, as in `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll`. At the same time, there's no impact on the generated code (identical assembly); in any case, that's strictly better than ICE. @hassnaa-arm, I'm wondering, just to be on the safe side, could you possibly run a quick check on your end to make sure that you're not encountering any issues with "sve-fixed-length-masked-gather.ll", either? That, and perhaps even add a RUN line with `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128` to that file (as it has caused an ICE with 128-bit SVE compilation before), if that would be no trouble? @paulwalker-arm, If this patch gets accepted and the above test works fine perhaps that would offer a way to close https://github.com/llvm/llvm-project/issues/56412? Matt: > This is going to enable a lot more code paths than it currently tested. Out of curiosity…
		MattUnsubmitted Not Done Reply Inline Actions Update: I've tested on the whole file and the ICE does appear, after all. The difference is that now the affected function is `masked_gather_v8f16` (whereas previously compiling `masked_gather_v2f16` alone was sufficient to trigger the ICE--now it no longer does). After compiling with `llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll`: PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llc --force-sve-when-streaming-compatible -aarch64-sve-vector-bits-min=128 -mtriple=arm64-unknown-unknown -mcpu=neoverse-n2 sve-fixed-length-masked-gather.ll 1. Running pass 'Function Pass Manager' on module 'sve-fixed-length-masked-gather.ll'. 2. Running pass 'AArch64 Instruction Selection' on function '@masked_gather_v8f16' #0 0x00007f1e7892db26 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:573:3 #1 0x00007f1e7892b9ad llvm::sys::RunSignalHandlers() /path/to/llvm-project/llvm/lib/Support/Signals.cpp:103:20 #2 0x00007f1e7892bb2c SignalHandler(int) /path/to/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1 #3 0x00007f1e77a42210 __restore_rt (/lib64/libc.so.6+0x3a210) #4 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #5 0x00007f1e7bec836c llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #6 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 #7 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32 #8 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #9 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #10 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 #11 0x00007f1e7bec847b llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) (.part.0) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:525:32 #12 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewNode(llvm::SDNode) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:503:33 #13 0x00007f1e7bec8371 llvm::DAGTypeLegalizer::AnalyzeNewValue(llvm::SDValue&) /path/to/llvm-project/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:571:14 #14 0x00007f1e7bec847b llvm::SDValue::getNode() const /path/to/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:159:36 . . . (the remaining part similarly recurring as in the aforementioned GitHub issue). Matt:* Update: I've tested on the whole file and the ICE does appear, after all. The difference is…
		return hasSVE();
		sdesmalenUnsubmitted Done Reply Inline Actions Should this return `true` always and instead have `assert(hasSVE() && "Expected SVE to be available")` ? If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set `+sve` somehow) sdesmalen: Should this return `true` always and instead have `assert(hasSVE() && "Expected SVE to be…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I guess this should match the code below, although I'm not quite sure why the assert it needed. paulwalker-arm: I guess this should match the code below, although I'm not quite sure why the assert it needed.
		hassnaa-armAuthorUnsubmitted Done Reply Inline Actions If someone forces using streaming-compatible code, SVE must be available. (and given that its not a user-exposed feature in Clang, it's fine for the compiler to crash if someone would use this feature while forgetting to set +sve somehow) hassnaa-arm: If someone forces using streaming-compatible code, SVE must be available. (and given that its…

		// Prefer NEON unless larger SVE registers are available.
		paulwalker-armUnsubmitted Done Reply Inline Actions if (forceStreamingCompatibleSVE()) return true; Doing this might mean `useSVEForFixedLengthVectors` can remain defined in the header file. paulwalker-arm: ``` if (forceStreamingCompatibleSVE()) return true; ``` Doing this might mean…
		return hasSVE() && getMinSVEVectorSizeInBits() >= 256;
		}
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `forceStreamingCompatibleSVE` (see other comment above) sdesmalen: nit: `forceStreamingCompatibleSVE` (see other comment above)
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Should this also include `\|\| hasSME()`? paulwalker-arm: Should this also include `\|\| hasSME()`?

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
		; RUN: llc --force-sve-128bit-vector -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_SVE_128
		; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_128
; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256		; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_256
; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512		; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512
; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512		; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -check-prefixes=CHECK,VBITS_GE_512

target triple = "aarch64-unknown-linux-gnu"		target triple = "aarch64-unknown-linux-gnu"

;		;
; Masked Stores		; Masked Stores
;		;

		; store v16i8
		define void @masked_store_v16i8(<16 x i8>* %dst, <16 x i1> %mask) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_v16i8:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: shl v0.16b, v0.16b, #7
		; VBITS_GE_SVE_128-NEXT: ptrue p0.b, vl16
		; VBITS_GE_SVE_128-NEXT: movi v1.2d, #0000000000000000
		; VBITS_GE_SVE_128-NEXT: cmlt v0.16b, v0.16b, #0
		; VBITS_GE_SVE_128-NEXT: cmpne p0.b, p0/z, z0.b, #0
		; VBITS_GE_SVE_128-NEXT: st1b { z1.b }, p0, [x0]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_v16i8:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: umov w8, v0.b[1]
		; VBITS_GE_128-NEXT: umov w10, v0.b[2]
		; VBITS_GE_128-NEXT: umov w9, v0.b[0]
		; VBITS_GE_128-NEXT: umov w11, v0.b[3]
		; VBITS_GE_128-NEXT: umov w12, v0.b[4]
		; VBITS_GE_128-NEXT: umov w13, v0.b[5]
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #1
		; VBITS_GE_128-NEXT: umov w8, v0.b[6]
		; VBITS_GE_128-NEXT: bfi w9, w10, #2, #1
		; VBITS_GE_128-NEXT: umov w10, v0.b[7]
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: umov w11, v0.b[8]
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: umov w12, v0.b[9]
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w13, #5, #1
		; VBITS_GE_128-NEXT: umov w13, v0.b[10]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w9, w8, lsl #6
		; VBITS_GE_128-NEXT: umov w9, v0.b[11]
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #7
		; VBITS_GE_128-NEXT: umov w10, v0.b[12]
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #8
		; VBITS_GE_128-NEXT: umov w11, v0.b[13]
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w12, lsl #9
		; VBITS_GE_128-NEXT: umov w12, v0.b[14]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w13, lsl #10
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #11
		; VBITS_GE_128-NEXT: and w9, w11, #0x1
		; VBITS_GE_128-NEXT: umov w11, v0.b[15]
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #12
		; VBITS_GE_128-NEXT: and w10, w12, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #13
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #14
		; VBITS_GE_128-NEXT: orr w9, w8, w11, lsl #15
		; VBITS_GE_128-NEXT: and w8, w9, #0xffff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB0_17
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB0_18
		; VBITS_GE_128-NEXT: .LBB0_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB0_19
		; VBITS_GE_128-NEXT: .LBB0_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB0_20
		; VBITS_GE_128-NEXT: .LBB0_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB0_21
		; VBITS_GE_128-NEXT: .LBB0_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB0_22
		; VBITS_GE_128-NEXT: .LBB0_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB0_23
		; VBITS_GE_128-NEXT: .LBB0_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB0_24
		; VBITS_GE_128-NEXT: .LBB0_8: // %else14
		; VBITS_GE_128-NEXT: tbnz w8, #8, .LBB0_25
		; VBITS_GE_128-NEXT: .LBB0_9: // %else16
		; VBITS_GE_128-NEXT: tbnz w8, #9, .LBB0_26
		; VBITS_GE_128-NEXT: .LBB0_10: // %else18
		; VBITS_GE_128-NEXT: tbnz w8, #10, .LBB0_27
		; VBITS_GE_128-NEXT: .LBB0_11: // %else20
		; VBITS_GE_128-NEXT: tbnz w8, #11, .LBB0_28
		; VBITS_GE_128-NEXT: .LBB0_12: // %else22
		; VBITS_GE_128-NEXT: tbnz w8, #12, .LBB0_29
		; VBITS_GE_128-NEXT: .LBB0_13: // %else24
		; VBITS_GE_128-NEXT: tbnz w8, #13, .LBB0_30
		; VBITS_GE_128-NEXT: .LBB0_14: // %else26
		; VBITS_GE_128-NEXT: tbnz w8, #14, .LBB0_31
		; VBITS_GE_128-NEXT: .LBB0_15: // %else28
		; VBITS_GE_128-NEXT: tbnz w8, #15, .LBB0_32
		; VBITS_GE_128-NEXT: .LBB0_16: // %else30
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB0_17: // %cond.store
		; VBITS_GE_128-NEXT: strb wzr, [x0]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB0_2
		; VBITS_GE_128-NEXT: .LBB0_18: // %cond.store1
		; VBITS_GE_128-NEXT: strb wzr, [x0, #1]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB0_3
		; VBITS_GE_128-NEXT: .LBB0_19: // %cond.store3
		; VBITS_GE_128-NEXT: strb wzr, [x0, #2]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB0_4
		; VBITS_GE_128-NEXT: .LBB0_20: // %cond.store5
		; VBITS_GE_128-NEXT: strb wzr, [x0, #3]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB0_5
		; VBITS_GE_128-NEXT: .LBB0_21: // %cond.store7
		; VBITS_GE_128-NEXT: strb wzr, [x0, #4]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB0_6
		; VBITS_GE_128-NEXT: .LBB0_22: // %cond.store9
		; VBITS_GE_128-NEXT: strb wzr, [x0, #5]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB0_7
		; VBITS_GE_128-NEXT: .LBB0_23: // %cond.store11
		; VBITS_GE_128-NEXT: strb wzr, [x0, #6]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB0_8
		; VBITS_GE_128-NEXT: .LBB0_24: // %cond.store13
		; VBITS_GE_128-NEXT: strb wzr, [x0, #7]
		; VBITS_GE_128-NEXT: tbz w8, #8, .LBB0_9
		; VBITS_GE_128-NEXT: .LBB0_25: // %cond.store15
		; VBITS_GE_128-NEXT: strb wzr, [x0, #8]
		; VBITS_GE_128-NEXT: tbz w8, #9, .LBB0_10
		; VBITS_GE_128-NEXT: .LBB0_26: // %cond.store17
		; VBITS_GE_128-NEXT: strb wzr, [x0, #9]
		; VBITS_GE_128-NEXT: tbz w8, #10, .LBB0_11
		; VBITS_GE_128-NEXT: .LBB0_27: // %cond.store19
		; VBITS_GE_128-NEXT: strb wzr, [x0, #10]
		; VBITS_GE_128-NEXT: tbz w8, #11, .LBB0_12
		; VBITS_GE_128-NEXT: .LBB0_28: // %cond.store21
		; VBITS_GE_128-NEXT: strb wzr, [x0, #11]
		; VBITS_GE_128-NEXT: tbz w8, #12, .LBB0_13
		; VBITS_GE_128-NEXT: .LBB0_29: // %cond.store23
		; VBITS_GE_128-NEXT: strb wzr, [x0, #12]
		; VBITS_GE_128-NEXT: tbz w8, #13, .LBB0_14
		; VBITS_GE_128-NEXT: .LBB0_30: // %cond.store25
		; VBITS_GE_128-NEXT: strb wzr, [x0, #13]
		; VBITS_GE_128-NEXT: tbz w8, #14, .LBB0_15
		; VBITS_GE_128-NEXT: .LBB0_31: // %cond.store27
		; VBITS_GE_128-NEXT: strb wzr, [x0, #14]
		; VBITS_GE_128-NEXT: tbz w8, #15, .LBB0_16
		; VBITS_GE_128-NEXT: .LBB0_32: // %cond.store29
		; VBITS_GE_128-NEXT: strb wzr, [x0, #15]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
		; VBITS_GE_256-LABEL: masked_store_v16i8:
		; VBITS_GE_256: // %bb.0:
		; VBITS_GE_256-NEXT: shl v0.16b, v0.16b, #7
		; VBITS_GE_256-NEXT: ptrue p0.b, vl16
		; VBITS_GE_256-NEXT: movi v1.2d, #0000000000000000
		; VBITS_GE_256-NEXT: cmlt v0.16b, v0.16b, #0
		; VBITS_GE_256-NEXT: cmpne p0.b, p0/z, z0.b, #0
		; VBITS_GE_256-NEXT: st1b { z1.b }, p0, [x0]
		; VBITS_GE_256-NEXT: ret
		;
		; VBITS_GE_512-LABEL: masked_store_v16i8:
		; VBITS_GE_512: // %bb.0:
		; VBITS_GE_512-NEXT: shl v0.16b, v0.16b, #7
		; VBITS_GE_512-NEXT: ptrue p0.b, vl16
		; VBITS_GE_512-NEXT: movi v1.2d, #0000000000000000
		; VBITS_GE_512-NEXT: cmlt v0.16b, v0.16b, #0
		; VBITS_GE_512-NEXT: cmpne p0.b, p0/z, z0.b, #0
		; VBITS_GE_512-NEXT: st1b { z1.b }, p0, [x0]
		; VBITS_GE_512-NEXT: ret
		call void @llvm.masked.store.v16i8(<16 x i8> zeroinitializer, <16 x i8>* %dst, i32 8, <16 x i1> %mask)
		ret void
		}

define void @masked_store_v2f16(<2 x half>* %ap, <2 x half>* %bp) vscale_range(2,0) #0 {		define void @masked_store_v2f16(<2 x half>* %ap, <2 x half>* %bp) vscale_range(2,0) #0 {
; CHECK-LABEL: masked_store_v2f16:		; CHECK-LABEL: masked_store_v2f16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr s1, [x0]		; CHECK-NEXT: ldr s1, [x0]
; CHECK-NEXT: movi d0, #0000000000000000		; CHECK-NEXT: movi d0, #0000000000000000
; CHECK-NEXT: ldr s2, [x1]		; CHECK-NEXT: ldr s2, [x1]
; CHECK-NEXT: ptrue p0.h, vl4		; CHECK-NEXT: ptrue p0.h, vl4
; CHECK-NEXT: fcmeq v2.4h, v1.4h, v2.4h		; CHECK-NEXT: fcmeq v2.4h, v1.4h, v2.4h
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%a = load <8 x float>, <8 x float>* %ap		%a = load <8 x float>, <8 x float>* %ap
%b = load <8 x float>, <8 x float>* %bp		%b = load <8 x float>, <8 x float>* %bp
%mask = fcmp oeq <8 x float> %a, %b		%mask = fcmp oeq <8 x float> %a, %b
call void @llvm.masked.store.v8f32(<8 x float> %a, <8 x float>* %bp, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8f32(<8 x float> %a, <8 x float>* %bp, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_v16f32(<16 x float>* %ap, <16 x float>* %bp) #0 {		define void @masked_store_v16f32(<16 x float>* %ap, <16 x float>* %bp) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_v16f32:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q0, q1, [x0]
		; VBITS_GE_SVE_128-NEXT: mov x8, #8
		; VBITS_GE_SVE_128-NEXT: mov x9, #12
		; VBITS_GE_SVE_128-NEXT: mov x10, #4
		; VBITS_GE_SVE_128-NEXT: ptrue p0.s, vl4
		; VBITS_GE_SVE_128-NEXT: ldp q3, q2, [x1]
		; VBITS_GE_SVE_128-NEXT: fcmeq v3.4s, v0.4s, v3.4s
		; VBITS_GE_SVE_128-NEXT: ldp q5, q4, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: fcmeq v2.4s, v1.4s, v2.4s
		; VBITS_GE_SVE_128-NEXT: cmpne p1.s, p0/z, z3.s, #0
		; VBITS_GE_SVE_128-NEXT: cmpne p2.s, p0/z, z2.s, #0
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: fcmeq v5.4s, v6.4s, v5.4s
		; VBITS_GE_SVE_128-NEXT: fcmeq v4.4s, v7.4s, v4.4s
		; VBITS_GE_SVE_128-NEXT: cmpne p3.s, p0/z, z5.s, #0
		; VBITS_GE_SVE_128-NEXT: cmpne p0.s, p0/z, z4.s, #0
		; VBITS_GE_SVE_128-NEXT: st1w { z7.s }, p0, [x0, x9, lsl #2]
		; VBITS_GE_SVE_128-NEXT: st1w { z6.s }, p3, [x0, x8, lsl #2]
		; VBITS_GE_SVE_128-NEXT: st1w { z1.s }, p2, [x0, x10, lsl #2]
		; VBITS_GE_SVE_128-NEXT: st1w { z0.s }, p1, [x0]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_v16f32:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q3, q2, [x0]
		; VBITS_GE_128-NEXT: ldp q1, q0, [x1]
		; VBITS_GE_128-NEXT: fcmeq v1.4s, v3.4s, v1.4s
		; VBITS_GE_128-NEXT: fcmeq v4.4s, v2.4s, v0.4s
		; VBITS_GE_128-NEXT: ldp q6, q5, [x1, #32]
		; VBITS_GE_128-NEXT: uzp1 v4.8h, v1.8h, v4.8h
		; VBITS_GE_128-NEXT: ldp q1, q0, [x0, #32]
		; VBITS_GE_128-NEXT: xtn v4.8b, v4.8h
		; VBITS_GE_128-NEXT: umov w8, v4.b[1]
		; VBITS_GE_128-NEXT: umov w10, v4.b[2]
		; VBITS_GE_128-NEXT: fcmeq v6.4s, v1.4s, v6.4s
		; VBITS_GE_128-NEXT: umov w9, v4.b[0]
		; VBITS_GE_128-NEXT: umov w11, v4.b[3]
		; VBITS_GE_128-NEXT: fcmeq v5.4s, v0.4s, v5.4s
		; VBITS_GE_128-NEXT: umov w12, v4.b[4]
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: umov w13, v4.b[5]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: uzp1 v5.8h, v6.8h, v5.8h
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #1
		; VBITS_GE_128-NEXT: umov w8, v4.b[6]
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w10, #2, #1
		; VBITS_GE_128-NEXT: umov w10, v4.b[7]
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: xtn v5.8b, v5.8h
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: umov w11, v5.b[0]
		; VBITS_GE_128-NEXT: umov w12, v5.b[1]
		; VBITS_GE_128-NEXT: bfi w9, w13, #5, #1
		; VBITS_GE_128-NEXT: umov w13, v5.b[2]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w9, w8, lsl #6
		; VBITS_GE_128-NEXT: umov w9, v5.b[3]
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #7
		; VBITS_GE_128-NEXT: umov w10, v5.b[4]
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #8
		; VBITS_GE_128-NEXT: umov w11, v5.b[5]
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w12, lsl #9
		; VBITS_GE_128-NEXT: umov w12, v5.b[6]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w13, lsl #10
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #11
		; VBITS_GE_128-NEXT: and w9, w11, #0x1
		; VBITS_GE_128-NEXT: umov w11, v5.b[7]
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #12
		; VBITS_GE_128-NEXT: and w10, w12, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #13
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #14
		; VBITS_GE_128-NEXT: orr w9, w8, w11, lsl #15
		; VBITS_GE_128-NEXT: and w8, w9, #0xffff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB5_17
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB5_18
		; VBITS_GE_128-NEXT: .LBB5_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB5_19
		; VBITS_GE_128-NEXT: .LBB5_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB5_20
		; VBITS_GE_128-NEXT: .LBB5_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB5_21
		; VBITS_GE_128-NEXT: .LBB5_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB5_22
		; VBITS_GE_128-NEXT: .LBB5_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB5_23
		; VBITS_GE_128-NEXT: .LBB5_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB5_24
		; VBITS_GE_128-NEXT: .LBB5_8: // %else14
		; VBITS_GE_128-NEXT: tbnz w8, #8, .LBB5_25
		; VBITS_GE_128-NEXT: .LBB5_9: // %else16
		; VBITS_GE_128-NEXT: tbnz w8, #9, .LBB5_26
		; VBITS_GE_128-NEXT: .LBB5_10: // %else18
		; VBITS_GE_128-NEXT: tbnz w8, #10, .LBB5_27
		; VBITS_GE_128-NEXT: .LBB5_11: // %else20
		; VBITS_GE_128-NEXT: tbnz w8, #11, .LBB5_28
		; VBITS_GE_128-NEXT: .LBB5_12: // %else22
		; VBITS_GE_128-NEXT: tbnz w8, #12, .LBB5_29
		; VBITS_GE_128-NEXT: .LBB5_13: // %else24
		; VBITS_GE_128-NEXT: tbnz w8, #13, .LBB5_30
		; VBITS_GE_128-NEXT: .LBB5_14: // %else26
		; VBITS_GE_128-NEXT: tbnz w8, #14, .LBB5_31
		; VBITS_GE_128-NEXT: .LBB5_15: // %else28
		; VBITS_GE_128-NEXT: tbnz w8, #15, .LBB5_32
		; VBITS_GE_128-NEXT: .LBB5_16: // %else30
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB5_17: // %cond.store
		; VBITS_GE_128-NEXT: str s3, [x0]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB5_2
		; VBITS_GE_128-NEXT: .LBB5_18: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x0, #4
		; VBITS_GE_128-NEXT: st1 { v3.s }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB5_3
		; VBITS_GE_128-NEXT: .LBB5_19: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x0, #8
		; VBITS_GE_128-NEXT: st1 { v3.s }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB5_4
		; VBITS_GE_128-NEXT: .LBB5_20: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x0, #12
		; VBITS_GE_128-NEXT: st1 { v3.s }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB5_5
		; VBITS_GE_128-NEXT: .LBB5_21: // %cond.store7
		; VBITS_GE_128-NEXT: str s2, [x0, #16]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB5_6
		; VBITS_GE_128-NEXT: .LBB5_22: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x0, #20
		; VBITS_GE_128-NEXT: st1 { v2.s }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB5_7
		; VBITS_GE_128-NEXT: .LBB5_23: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x0, #24
		; VBITS_GE_128-NEXT: st1 { v2.s }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB5_8
		; VBITS_GE_128-NEXT: .LBB5_24: // %cond.store13
		; VBITS_GE_128-NEXT: add x9, x0, #28
		; VBITS_GE_128-NEXT: st1 { v2.s }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #8, .LBB5_9
		; VBITS_GE_128-NEXT: .LBB5_25: // %cond.store15
		; VBITS_GE_128-NEXT: str s1, [x0, #32]
		; VBITS_GE_128-NEXT: tbz w8, #9, .LBB5_10
		; VBITS_GE_128-NEXT: .LBB5_26: // %cond.store17
		; VBITS_GE_128-NEXT: add x9, x0, #36
		; VBITS_GE_128-NEXT: st1 { v1.s }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #10, .LBB5_11
		; VBITS_GE_128-NEXT: .LBB5_27: // %cond.store19
		; VBITS_GE_128-NEXT: add x9, x0, #40
		; VBITS_GE_128-NEXT: st1 { v1.s }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #11, .LBB5_12
		; VBITS_GE_128-NEXT: .LBB5_28: // %cond.store21
		; VBITS_GE_128-NEXT: add x9, x0, #44
		; VBITS_GE_128-NEXT: st1 { v1.s }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #12, .LBB5_13
		; VBITS_GE_128-NEXT: .LBB5_29: // %cond.store23
		; VBITS_GE_128-NEXT: str s0, [x0, #48]
		; VBITS_GE_128-NEXT: tbz w8, #13, .LBB5_14
		; VBITS_GE_128-NEXT: .LBB5_30: // %cond.store25
		; VBITS_GE_128-NEXT: add x9, x0, #52
		; VBITS_GE_128-NEXT: st1 { v0.s }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #14, .LBB5_15
		; VBITS_GE_128-NEXT: .LBB5_31: // %cond.store27
		; VBITS_GE_128-NEXT: add x9, x0, #56
		; VBITS_GE_128-NEXT: st1 { v0.s }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #15, .LBB5_16
		; VBITS_GE_128-NEXT: .LBB5_32: // %cond.store29
		; VBITS_GE_128-NEXT: add x8, x0, #60
		; VBITS_GE_128-NEXT: st1 { v0.s }[3], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_v16f32:		; VBITS_GE_256-LABEL: masked_store_v16f32:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #8		; VBITS_GE_256-NEXT: mov x8, #8
; VBITS_GE_256-NEXT: ptrue p0.s, vl8		; VBITS_GE_256-NEXT: ptrue p0.s, vl8
; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = load <64 x float>, <64 x float>* %ap		%a = load <64 x float>, <64 x float>* %ap
%b = load <64 x float>, <64 x float>* %bp		%b = load <64 x float>, <64 x float>* %bp
%mask = fcmp oeq <64 x float> %a, %b		%mask = fcmp oeq <64 x float> %a, %b
call void @llvm.masked.store.v64f32(<64 x float> %a, <64 x float>* %ap, i32 8, <64 x i1> %mask)		call void @llvm.masked.store.v64f32(<64 x float> %a, <64 x float>* %ap, i32 8, <64 x i1> %mask)
ret void		ret void
}		}

		; store v2f64
		define void @masked_store_v2f64(<2 x double>* %dst, <2 x i1> %mask) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_v2f64:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ushll v0.2d, v0.2s, #0
		; VBITS_GE_SVE_128-NEXT: ptrue p0.d, vl2
		; VBITS_GE_SVE_128-NEXT: movi v1.2d, #0000000000000000
		; VBITS_GE_SVE_128-NEXT: shl v0.2d, v0.2d, #63
		; VBITS_GE_SVE_128-NEXT: cmlt v0.2d, v0.2d, #0
		; VBITS_GE_SVE_128-NEXT: cmpne p0.d, p0/z, z0.d, #0
		; VBITS_GE_SVE_128-NEXT: st1d { z1.d }, p0, [x0]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_v2f64:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: // kill: def $d0 killed $d0 def $q0
		; VBITS_GE_128-NEXT: mov w8, v0.s[1]
		; VBITS_GE_128-NEXT: fmov w9, s0
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #31
		; VBITS_GE_128-NEXT: and w8, w9, #0x3
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB8_3
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB8_4
		; VBITS_GE_128-NEXT: .LBB8_2: // %else2
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB8_3: // %cond.store
		; VBITS_GE_128-NEXT: str xzr, [x0]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB8_2
		; VBITS_GE_128-NEXT: .LBB8_4: // %cond.store1
		; VBITS_GE_128-NEXT: str xzr, [x0, #8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
		; VBITS_GE_256-LABEL: masked_store_v2f64:
		; VBITS_GE_256: // %bb.0:
		; VBITS_GE_256-NEXT: ushll v0.2d, v0.2s, #0
		; VBITS_GE_256-NEXT: ptrue p0.d, vl2
		; VBITS_GE_256-NEXT: movi v1.2d, #0000000000000000
		; VBITS_GE_256-NEXT: shl v0.2d, v0.2d, #63
		; VBITS_GE_256-NEXT: cmlt v0.2d, v0.2d, #0
		; VBITS_GE_256-NEXT: cmpne p0.d, p0/z, z0.d, #0
		; VBITS_GE_256-NEXT: st1d { z1.d }, p0, [x0]
		; VBITS_GE_256-NEXT: ret
		;
		; VBITS_GE_512-LABEL: masked_store_v2f64:
		; VBITS_GE_512: // %bb.0:
		; VBITS_GE_512-NEXT: ushll v0.2d, v0.2s, #0
		; VBITS_GE_512-NEXT: ptrue p0.d, vl2
		; VBITS_GE_512-NEXT: movi v1.2d, #0000000000000000
		; VBITS_GE_512-NEXT: shl v0.2d, v0.2d, #63
		; VBITS_GE_512-NEXT: cmlt v0.2d, v0.2d, #0
		; VBITS_GE_512-NEXT: cmpne p0.d, p0/z, z0.d, #0
		; VBITS_GE_512-NEXT: st1d { z1.d }, p0, [x0]
		; VBITS_GE_512-NEXT: ret
		call void @llvm.masked.store.v2f64(<2 x double> zeroinitializer, <2 x double>* %dst, i32 8, <2 x i1> %mask)
		ret void
		}

define void @masked_store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i8>* %dest) #0 {		define void @masked_store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i8>* %dest) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_trunc_v8i64i8:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q2, q3, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: adrp x8, .LCPI9_0
		; VBITS_GE_SVE_128-NEXT: ptrue p0.b, vl8
		; VBITS_GE_SVE_128-NEXT: ldp q4, q5, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: xtn v23.2s, v3.2d
		; VBITS_GE_SVE_128-NEXT: xtn v22.2s, v2.2d
		; VBITS_GE_SVE_128-NEXT: cmeq v4.2d, v2.2d, v4.2d
		; VBITS_GE_SVE_128-NEXT: ldp q0, q1, [x0]
		; VBITS_GE_SVE_128-NEXT: cmeq v5.2d, v3.2d, v5.2d
		; VBITS_GE_SVE_128-NEXT: xtn v19.2s, v5.2d
		; VBITS_GE_SVE_128-NEXT: xtn v18.2s, v4.2d
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x1]
		; VBITS_GE_SVE_128-NEXT: xtn v21.2s, v1.2d
		; VBITS_GE_SVE_128-NEXT: xtn v20.2s, v0.2d
		; VBITS_GE_SVE_128-NEXT: cmeq v6.2d, v0.2d, v6.2d
		; VBITS_GE_SVE_128-NEXT: cmeq v7.2d, v1.2d, v7.2d
		; VBITS_GE_SVE_128-NEXT: ldr d2, [x8, :lo12:.LCPI9_0]
		; VBITS_GE_SVE_128-NEXT: xtn v17.2s, v7.2d
		; VBITS_GE_SVE_128-NEXT: xtn v16.2s, v6.2d
		; VBITS_GE_SVE_128-NEXT: tbl v1.8b, { v20.16b, v21.16b, v22.16b, v23.16b }, v2.8b
		; VBITS_GE_SVE_128-NEXT: tbl v0.8b, { v16.16b, v17.16b, v18.16b, v19.16b }, v2.8b
		; VBITS_GE_SVE_128-NEXT: cmpne p0.b, p0/z, z0.b, #0
		; VBITS_GE_SVE_128-NEXT: st1b { z1.b }, p0, [x2]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_trunc_v8i64i8:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q0, q5, [x1, #32]
		; VBITS_GE_128-NEXT: ldp q1, q2, [x0]
		; VBITS_GE_128-NEXT: ldp q3, q4, [x0, #32]
		; VBITS_GE_128-NEXT: cmeq v0.2d, v3.2d, v0.2d
		; VBITS_GE_128-NEXT: ldp q6, q7, [x1]
		; VBITS_GE_128-NEXT: cmeq v5.2d, v4.2d, v5.2d
		; VBITS_GE_128-NEXT: uzp1 v3.4s, v3.4s, v4.4s
		; VBITS_GE_128-NEXT: uzp1 v0.4s, v0.4s, v5.4s
		; VBITS_GE_128-NEXT: cmeq v6.2d, v1.2d, v6.2d
		; VBITS_GE_128-NEXT: uzp1 v1.4s, v1.4s, v2.4s
		; VBITS_GE_128-NEXT: cmeq v7.2d, v2.2d, v7.2d
		; VBITS_GE_128-NEXT: uzp1 v5.4s, v6.4s, v7.4s
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v5.8h, v0.8h
		; VBITS_GE_128-NEXT: xtn v0.8b, v0.8h
		; VBITS_GE_128-NEXT: umov w8, v0.b[1]
		; VBITS_GE_128-NEXT: umov w9, v0.b[0]
		; VBITS_GE_128-NEXT: umov w10, v0.b[2]
		; VBITS_GE_128-NEXT: umov w11, v0.b[3]
		; VBITS_GE_128-NEXT: umov w12, v0.b[4]
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #1
		; VBITS_GE_128-NEXT: umov w8, v0.b[5]
		; VBITS_GE_128-NEXT: bfi w9, w10, #2, #1
		; VBITS_GE_128-NEXT: umov w10, v0.b[6]
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: umov w11, v0.b[7]
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v1.8h, v3.8h
		; VBITS_GE_128-NEXT: bfi w9, w8, #5, #1
		; VBITS_GE_128-NEXT: and w8, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w9, w8, lsl #6
		; VBITS_GE_128-NEXT: xtn v0.8b, v0.8h
		; VBITS_GE_128-NEXT: orr w9, w8, w11, lsl #7
		; VBITS_GE_128-NEXT: and w8, w9, #0xff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB9_9
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB9_10
		; VBITS_GE_128-NEXT: .LBB9_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB9_11
		; VBITS_GE_128-NEXT: .LBB9_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB9_12
		; VBITS_GE_128-NEXT: .LBB9_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB9_13
		; VBITS_GE_128-NEXT: .LBB9_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB9_14
		; VBITS_GE_128-NEXT: .LBB9_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB9_15
		; VBITS_GE_128-NEXT: .LBB9_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB9_16
		; VBITS_GE_128-NEXT: .LBB9_8: // %else14
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB9_9: // %cond.store
		; VBITS_GE_128-NEXT: st1 { v0.b }[0], [x2]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB9_2
		; VBITS_GE_128-NEXT: .LBB9_10: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x2, #1
		; VBITS_GE_128-NEXT: st1 { v0.b }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB9_3
		; VBITS_GE_128-NEXT: .LBB9_11: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x2, #2
		; VBITS_GE_128-NEXT: st1 { v0.b }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB9_4
		; VBITS_GE_128-NEXT: .LBB9_12: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x2, #3
		; VBITS_GE_128-NEXT: st1 { v0.b }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB9_5
		; VBITS_GE_128-NEXT: .LBB9_13: // %cond.store7
		; VBITS_GE_128-NEXT: add x9, x2, #4
		; VBITS_GE_128-NEXT: st1 { v0.b }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB9_6
		; VBITS_GE_128-NEXT: .LBB9_14: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x2, #5
		; VBITS_GE_128-NEXT: st1 { v0.b }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB9_7
		; VBITS_GE_128-NEXT: .LBB9_15: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x2, #6
		; VBITS_GE_128-NEXT: st1 { v0.b }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB9_8
		; VBITS_GE_128-NEXT: .LBB9_16: // %cond.store13
		; VBITS_GE_128-NEXT: add x8, x2, #7
		; VBITS_GE_128-NEXT: st1 { v0.b }[7], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_trunc_v8i64i8:		; VBITS_GE_256-LABEL: masked_store_trunc_v8i64i8:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #4		; VBITS_GE_256-NEXT: mov x8, #4
; VBITS_GE_256-NEXT: ptrue p0.d, vl4		; VBITS_GE_256-NEXT: ptrue p0.d, vl4
; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
Show All 25 Lines	; VBITS_GE_512-NEXT: ret
%b = load <8 x i64>, <8 x i64>* %bp		%b = load <8 x i64>, <8 x i64>* %bp
%mask = icmp eq <8 x i64> %a, %b		%mask = icmp eq <8 x i64> %a, %b
%val = trunc <8 x i64> %a to <8 x i8>		%val = trunc <8 x i64> %a to <8 x i8>
call void @llvm.masked.store.v8i8(<8 x i8> %val, <8 x i8>* %dest, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8i8(<8 x i8> %val, <8 x i8>* %dest, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i16>* %dest) #0 {		define void @masked_store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i16>* %dest) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_trunc_v8i64i16:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q2, q3, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: adrp x8, .LCPI10_0
		; VBITS_GE_SVE_128-NEXT: ptrue p0.h, vl8
		; VBITS_GE_SVE_128-NEXT: ldp q4, q5, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: xtn v23.2s, v3.2d
		; VBITS_GE_SVE_128-NEXT: xtn v22.2s, v2.2d
		; VBITS_GE_SVE_128-NEXT: cmeq v4.2d, v2.2d, v4.2d
		; VBITS_GE_SVE_128-NEXT: ldp q0, q1, [x0]
		; VBITS_GE_SVE_128-NEXT: cmeq v5.2d, v3.2d, v5.2d
		; VBITS_GE_SVE_128-NEXT: xtn v19.2s, v5.2d
		; VBITS_GE_SVE_128-NEXT: xtn v18.2s, v4.2d
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x1]
		; VBITS_GE_SVE_128-NEXT: xtn v21.2s, v1.2d
		; VBITS_GE_SVE_128-NEXT: xtn v20.2s, v0.2d
		; VBITS_GE_SVE_128-NEXT: cmeq v6.2d, v0.2d, v6.2d
		; VBITS_GE_SVE_128-NEXT: cmeq v7.2d, v1.2d, v7.2d
		; VBITS_GE_SVE_128-NEXT: ldr q2, [x8, :lo12:.LCPI10_0]
		; VBITS_GE_SVE_128-NEXT: xtn v17.2s, v7.2d
		; VBITS_GE_SVE_128-NEXT: xtn v16.2s, v6.2d
		; VBITS_GE_SVE_128-NEXT: tbl v1.16b, { v20.16b, v21.16b, v22.16b, v23.16b }, v2.16b
		; VBITS_GE_SVE_128-NEXT: tbl v0.16b, { v16.16b, v17.16b, v18.16b, v19.16b }, v2.16b
		; VBITS_GE_SVE_128-NEXT: cmpne p0.h, p0/z, z0.h, #0
		; VBITS_GE_SVE_128-NEXT: st1h { z1.h }, p0, [x2]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_trunc_v8i64i16:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q0, q5, [x1, #32]
		; VBITS_GE_128-NEXT: ldp q1, q2, [x0]
		; VBITS_GE_128-NEXT: ldp q3, q4, [x0, #32]
		; VBITS_GE_128-NEXT: cmeq v0.2d, v3.2d, v0.2d
		; VBITS_GE_128-NEXT: ldp q6, q7, [x1]
		; VBITS_GE_128-NEXT: cmeq v5.2d, v4.2d, v5.2d
		; VBITS_GE_128-NEXT: uzp1 v3.4s, v3.4s, v4.4s
		; VBITS_GE_128-NEXT: uzp1 v0.4s, v0.4s, v5.4s
		; VBITS_GE_128-NEXT: cmeq v6.2d, v1.2d, v6.2d
		; VBITS_GE_128-NEXT: cmeq v7.2d, v2.2d, v7.2d
		; VBITS_GE_128-NEXT: uzp1 v5.4s, v6.4s, v7.4s
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v5.8h, v0.8h
		; VBITS_GE_128-NEXT: xtn v0.8b, v0.8h
		; VBITS_GE_128-NEXT: umov w8, v0.b[1]
		; VBITS_GE_128-NEXT: umov w9, v0.b[0]
		; VBITS_GE_128-NEXT: umov w10, v0.b[2]
		; VBITS_GE_128-NEXT: umov w11, v0.b[3]
		; VBITS_GE_128-NEXT: umov w12, v0.b[4]
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #1
		; VBITS_GE_128-NEXT: umov w8, v0.b[5]
		; VBITS_GE_128-NEXT: bfi w9, w10, #2, #1
		; VBITS_GE_128-NEXT: umov w10, v0.b[6]
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: umov w11, v0.b[7]
		; VBITS_GE_128-NEXT: uzp1 v0.4s, v1.4s, v2.4s
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w8, #5, #1
		; VBITS_GE_128-NEXT: orr w8, w9, w10, lsl #6
		; VBITS_GE_128-NEXT: orr w9, w8, w11, lsl #7
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v0.8h, v3.8h
		; VBITS_GE_128-NEXT: and w8, w9, #0xff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB10_9
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB10_10
		; VBITS_GE_128-NEXT: .LBB10_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB10_11
		; VBITS_GE_128-NEXT: .LBB10_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB10_12
		; VBITS_GE_128-NEXT: .LBB10_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB10_13
		; VBITS_GE_128-NEXT: .LBB10_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB10_14
		; VBITS_GE_128-NEXT: .LBB10_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB10_15
		; VBITS_GE_128-NEXT: .LBB10_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB10_16
		; VBITS_GE_128-NEXT: .LBB10_8: // %else14
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB10_9: // %cond.store
		; VBITS_GE_128-NEXT: str h0, [x2]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB10_2
		; VBITS_GE_128-NEXT: .LBB10_10: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x2, #2
		; VBITS_GE_128-NEXT: st1 { v0.h }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB10_3
		; VBITS_GE_128-NEXT: .LBB10_11: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x2, #4
		; VBITS_GE_128-NEXT: st1 { v0.h }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB10_4
		; VBITS_GE_128-NEXT: .LBB10_12: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x2, #6
		; VBITS_GE_128-NEXT: st1 { v0.h }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB10_5
		; VBITS_GE_128-NEXT: .LBB10_13: // %cond.store7
		; VBITS_GE_128-NEXT: add x9, x2, #8
		; VBITS_GE_128-NEXT: st1 { v0.h }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB10_6
		; VBITS_GE_128-NEXT: .LBB10_14: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x2, #10
		; VBITS_GE_128-NEXT: st1 { v0.h }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB10_7
		; VBITS_GE_128-NEXT: .LBB10_15: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x2, #12
		; VBITS_GE_128-NEXT: st1 { v0.h }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB10_8
		; VBITS_GE_128-NEXT: .LBB10_16: // %cond.store13
		; VBITS_GE_128-NEXT: add x8, x2, #14
		; VBITS_GE_128-NEXT: st1 { v0.h }[7], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_trunc_v8i64i16:		; VBITS_GE_256-LABEL: masked_store_trunc_v8i64i16:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #4		; VBITS_GE_256-NEXT: mov x8, #4
; VBITS_GE_256-NEXT: ptrue p0.d, vl4		; VBITS_GE_256-NEXT: ptrue p0.d, vl4
; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
Show All 28 Lines	; VBITS_GE_512-NEXT: ret
%b = load <8 x i64>, <8 x i64>* %bp		%b = load <8 x i64>, <8 x i64>* %bp
%mask = icmp eq <8 x i64> %a, %b		%mask = icmp eq <8 x i64> %a, %b
%val = trunc <8 x i64> %a to <8 x i16>		%val = trunc <8 x i64> %a to <8 x i16>
call void @llvm.masked.store.v8i16(<8 x i16> %val, <8 x i16>* %dest, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8i16(<8 x i16> %val, <8 x i16>* %dest, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i32>* %dest) #0 {		define void @masked_store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i32>* %dest) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_trunc_v8i64i32:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q0, q5, [x1]
		; VBITS_GE_SVE_128-NEXT: mov x8, #4
		; VBITS_GE_SVE_128-NEXT: ptrue p0.s, vl4
		; VBITS_GE_SVE_128-NEXT: ldp q1, q2, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: ldp q3, q4, [x0]
		; VBITS_GE_SVE_128-NEXT: cmeq v0.2d, v3.2d, v0.2d
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: cmeq v5.2d, v4.2d, v5.2d
		; VBITS_GE_SVE_128-NEXT: uzp1 v3.4s, v3.4s, v4.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v0.4s, v0.4s, v5.4s
		; VBITS_GE_SVE_128-NEXT: cmeq v6.2d, v1.2d, v6.2d
		; VBITS_GE_SVE_128-NEXT: uzp1 v1.4s, v1.4s, v2.4s
		; VBITS_GE_SVE_128-NEXT: cmeq v7.2d, v2.2d, v7.2d
		; VBITS_GE_SVE_128-NEXT: cmpne p1.s, p0/z, z0.s, #0
		; VBITS_GE_SVE_128-NEXT: uzp1 v5.4s, v6.4s, v7.4s
		; VBITS_GE_SVE_128-NEXT: cmpne p0.s, p0/z, z5.s, #0
		; VBITS_GE_SVE_128-NEXT: st1w { z1.s }, p0, [x2, x8, lsl #2]
		; VBITS_GE_SVE_128-NEXT: st1w { z3.s }, p1, [x2]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_trunc_v8i64i32:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q2, q5, [x1, #32]
		; VBITS_GE_128-NEXT: ldp q3, q4, [x0]
		; VBITS_GE_128-NEXT: ldp q0, q1, [x0, #32]
		; VBITS_GE_128-NEXT: cmeq v2.2d, v0.2d, v2.2d
		; VBITS_GE_128-NEXT: ldp q6, q7, [x1]
		; VBITS_GE_128-NEXT: cmeq v5.2d, v1.2d, v5.2d
		; VBITS_GE_128-NEXT: uzp1 v2.4s, v2.4s, v5.4s
		; VBITS_GE_128-NEXT: cmeq v6.2d, v3.2d, v6.2d
		; VBITS_GE_128-NEXT: cmeq v7.2d, v4.2d, v7.2d
		; VBITS_GE_128-NEXT: uzp1 v5.4s, v6.4s, v7.4s
		; VBITS_GE_128-NEXT: uzp1 v2.8h, v5.8h, v2.8h
		; VBITS_GE_128-NEXT: xtn v2.8b, v2.8h
		; VBITS_GE_128-NEXT: umov w8, v2.b[1]
		; VBITS_GE_128-NEXT: umov w9, v2.b[2]
		; VBITS_GE_128-NEXT: umov w10, v2.b[0]
		; VBITS_GE_128-NEXT: umov w11, v2.b[3]
		; VBITS_GE_128-NEXT: umov w12, v2.b[4]
		; VBITS_GE_128-NEXT: umov w13, v2.b[5]
		; VBITS_GE_128-NEXT: umov w14, v2.b[6]
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: bfi w10, w8, #1, #1
		; VBITS_GE_128-NEXT: and w8, w12, #0x1
		; VBITS_GE_128-NEXT: bfi w10, w9, #2, #1
		; VBITS_GE_128-NEXT: and w9, w13, #0x1
		; VBITS_GE_128-NEXT: bfi w10, w11, #3, #1
		; VBITS_GE_128-NEXT: umov w11, v2.b[7]
		; VBITS_GE_128-NEXT: bfi w10, w8, #4, #1
		; VBITS_GE_128-NEXT: and w8, w14, #0x1
		; VBITS_GE_128-NEXT: bfi w10, w9, #5, #1
		; VBITS_GE_128-NEXT: orr w8, w10, w8, lsl #6
		; VBITS_GE_128-NEXT: orr w9, w8, w11, lsl #7
		; VBITS_GE_128-NEXT: uzp1 v2.4s, v3.4s, v4.4s
		; VBITS_GE_128-NEXT: and w8, w9, #0xff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB11_9
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB11_10
		; VBITS_GE_128-NEXT: .LBB11_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB11_11
		; VBITS_GE_128-NEXT: .LBB11_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB11_12
		; VBITS_GE_128-NEXT: .LBB11_4: // %else6
		; VBITS_GE_128-NEXT: uzp1 v0.4s, v0.4s, v1.4s
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB11_13
		; VBITS_GE_128-NEXT: .LBB11_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB11_14
		; VBITS_GE_128-NEXT: .LBB11_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB11_15
		; VBITS_GE_128-NEXT: .LBB11_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB11_16
		; VBITS_GE_128-NEXT: .LBB11_8: // %else14
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB11_9: // %cond.store
		; VBITS_GE_128-NEXT: str s2, [x2]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB11_2
		; VBITS_GE_128-NEXT: .LBB11_10: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x2, #4
		; VBITS_GE_128-NEXT: st1 { v2.s }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB11_3
		; VBITS_GE_128-NEXT: .LBB11_11: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x2, #8
		; VBITS_GE_128-NEXT: st1 { v2.s }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB11_4
		; VBITS_GE_128-NEXT: .LBB11_12: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x2, #12
		; VBITS_GE_128-NEXT: st1 { v2.s }[3], [x9]
		; VBITS_GE_128-NEXT: uzp1 v0.4s, v0.4s, v1.4s
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB11_5
		; VBITS_GE_128-NEXT: .LBB11_13: // %cond.store7
		; VBITS_GE_128-NEXT: str s0, [x2, #16]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB11_6
		; VBITS_GE_128-NEXT: .LBB11_14: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x2, #20
		; VBITS_GE_128-NEXT: st1 { v0.s }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB11_7
		; VBITS_GE_128-NEXT: .LBB11_15: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x2, #24
		; VBITS_GE_128-NEXT: st1 { v0.s }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB11_8
		; VBITS_GE_128-NEXT: .LBB11_16: // %cond.store13
		; VBITS_GE_128-NEXT: add x8, x2, #28
		; VBITS_GE_128-NEXT: st1 { v0.s }[3], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_trunc_v8i64i32:		; VBITS_GE_256-LABEL: masked_store_trunc_v8i64i32:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #4		; VBITS_GE_256-NEXT: mov x8, #4
; VBITS_GE_256-NEXT: ptrue p0.d, vl4		; VBITS_GE_256-NEXT: ptrue p0.d, vl4
; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
Show All 25 Lines	; VBITS_GE_512-NEXT: ret
%b = load <8 x i64>, <8 x i64>* %bp		%b = load <8 x i64>, <8 x i64>* %bp
%mask = icmp eq <8 x i64> %a, %b		%mask = icmp eq <8 x i64> %a, %b
%val = trunc <8 x i64> %a to <8 x i32>		%val = trunc <8 x i64> %a to <8 x i32>
call void @llvm.masked.store.v8i32(<8 x i32> %val, <8 x i32>* %dest, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8i32(<8 x i32> %val, <8 x i32>* %dest, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i8>* %dest) #0 {		define void @masked_store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i8>* %dest) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_trunc_v16i32i8:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q0, q1, [x0]
		; VBITS_GE_SVE_128-NEXT: ptrue p0.b, vl16
		; VBITS_GE_SVE_128-NEXT: ldp q2, q3, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: ldp q4, q5, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: cmeq v4.4s, v2.4s, v4.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v2.8h, v2.8h, v3.8h
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x1]
		; VBITS_GE_SVE_128-NEXT: cmeq v5.4s, v3.4s, v5.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v4.8h, v4.8h, v5.8h
		; VBITS_GE_SVE_128-NEXT: cmeq v6.4s, v0.4s, v6.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v0.8h, v0.8h, v1.8h
		; VBITS_GE_SVE_128-NEXT: cmeq v7.4s, v1.4s, v7.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v5.8h, v6.8h, v7.8h
		; VBITS_GE_SVE_128-NEXT: uzp1 v0.16b, v0.16b, v2.16b
		; VBITS_GE_SVE_128-NEXT: uzp1 v1.16b, v5.16b, v4.16b
		; VBITS_GE_SVE_128-NEXT: cmpne p0.b, p0/z, z1.b, #0
		; VBITS_GE_SVE_128-NEXT: st1b { z0.b }, p0, [x2]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_trunc_v16i32i8:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q3, q2, [x1]
		; VBITS_GE_128-NEXT: ldp q0, q1, [x0]
		; VBITS_GE_128-NEXT: cmeq v3.4s, v0.4s, v3.4s
		; VBITS_GE_128-NEXT: cmeq v2.4s, v1.4s, v2.4s
		; VBITS_GE_128-NEXT: ldp q5, q4, [x0, #32]
		; VBITS_GE_128-NEXT: uzp1 v2.8h, v3.8h, v2.8h
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v0.8h, v1.8h
		; VBITS_GE_128-NEXT: xtn v2.8b, v2.8h
		; VBITS_GE_128-NEXT: ldp q6, q3, [x1, #32]
		; VBITS_GE_128-NEXT: umov w8, v2.b[1]
		; VBITS_GE_128-NEXT: umov w10, v2.b[2]
		; VBITS_GE_128-NEXT: umov w9, v2.b[0]
		; VBITS_GE_128-NEXT: umov w11, v2.b[3]
		; VBITS_GE_128-NEXT: umov w12, v2.b[4]
		; VBITS_GE_128-NEXT: umov w13, v2.b[5]
		; VBITS_GE_128-NEXT: cmeq v6.4s, v5.4s, v6.4s
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: cmeq v3.4s, v4.4s, v3.4s
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: uzp1 v3.8h, v6.8h, v3.8h
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #1
		; VBITS_GE_128-NEXT: umov w8, v2.b[6]
		; VBITS_GE_128-NEXT: bfi w9, w10, #2, #1
		; VBITS_GE_128-NEXT: umov w10, v2.b[7]
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: xtn v3.8b, v3.8h
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w13, #5, #1
		; VBITS_GE_128-NEXT: umov w11, v3.b[0]
		; VBITS_GE_128-NEXT: umov w12, v3.b[1]
		; VBITS_GE_128-NEXT: umov w13, v3.b[2]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w9, w8, lsl #6
		; VBITS_GE_128-NEXT: umov w9, v3.b[3]
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #7
		; VBITS_GE_128-NEXT: umov w10, v3.b[4]
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: uzp1 v2.8h, v5.8h, v4.8h
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #8
		; VBITS_GE_128-NEXT: and w11, w13, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w12, lsl #9
		; VBITS_GE_128-NEXT: umov w12, v3.b[5]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #10
		; VBITS_GE_128-NEXT: umov w11, v3.b[6]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #11
		; VBITS_GE_128-NEXT: umov w9, v3.b[7]
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #12
		; VBITS_GE_128-NEXT: and w10, w12, #0x1
		; VBITS_GE_128-NEXT: uzp1 v0.16b, v0.16b, v2.16b
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #13
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #14
		; VBITS_GE_128-NEXT: orr w9, w8, w9, lsl #15
		; VBITS_GE_128-NEXT: and w8, w9, #0xffff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB12_17
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB12_18
		; VBITS_GE_128-NEXT: .LBB12_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB12_19
		; VBITS_GE_128-NEXT: .LBB12_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB12_20
		; VBITS_GE_128-NEXT: .LBB12_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB12_21
		; VBITS_GE_128-NEXT: .LBB12_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB12_22
		; VBITS_GE_128-NEXT: .LBB12_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB12_23
		; VBITS_GE_128-NEXT: .LBB12_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB12_24
		; VBITS_GE_128-NEXT: .LBB12_8: // %else14
		; VBITS_GE_128-NEXT: tbnz w8, #8, .LBB12_25
		; VBITS_GE_128-NEXT: .LBB12_9: // %else16
		; VBITS_GE_128-NEXT: tbnz w8, #9, .LBB12_26
		; VBITS_GE_128-NEXT: .LBB12_10: // %else18
		; VBITS_GE_128-NEXT: tbnz w8, #10, .LBB12_27
		; VBITS_GE_128-NEXT: .LBB12_11: // %else20
		; VBITS_GE_128-NEXT: tbnz w8, #11, .LBB12_28
		; VBITS_GE_128-NEXT: .LBB12_12: // %else22
		; VBITS_GE_128-NEXT: tbnz w8, #12, .LBB12_29
		; VBITS_GE_128-NEXT: .LBB12_13: // %else24
		; VBITS_GE_128-NEXT: tbnz w8, #13, .LBB12_30
		; VBITS_GE_128-NEXT: .LBB12_14: // %else26
		; VBITS_GE_128-NEXT: tbnz w8, #14, .LBB12_31
		; VBITS_GE_128-NEXT: .LBB12_15: // %else28
		; VBITS_GE_128-NEXT: tbnz w8, #15, .LBB12_32
		; VBITS_GE_128-NEXT: .LBB12_16: // %else30
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB12_17: // %cond.store
		; VBITS_GE_128-NEXT: st1 { v0.b }[0], [x2]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB12_2
		; VBITS_GE_128-NEXT: .LBB12_18: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x2, #1
		; VBITS_GE_128-NEXT: st1 { v0.b }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB12_3
		; VBITS_GE_128-NEXT: .LBB12_19: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x2, #2
		; VBITS_GE_128-NEXT: st1 { v0.b }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB12_4
		; VBITS_GE_128-NEXT: .LBB12_20: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x2, #3
		; VBITS_GE_128-NEXT: st1 { v0.b }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB12_5
		; VBITS_GE_128-NEXT: .LBB12_21: // %cond.store7
		; VBITS_GE_128-NEXT: add x9, x2, #4
		; VBITS_GE_128-NEXT: st1 { v0.b }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB12_6
		; VBITS_GE_128-NEXT: .LBB12_22: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x2, #5
		; VBITS_GE_128-NEXT: st1 { v0.b }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB12_7
		; VBITS_GE_128-NEXT: .LBB12_23: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x2, #6
		; VBITS_GE_128-NEXT: st1 { v0.b }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB12_8
		; VBITS_GE_128-NEXT: .LBB12_24: // %cond.store13
		; VBITS_GE_128-NEXT: add x9, x2, #7
		; VBITS_GE_128-NEXT: st1 { v0.b }[7], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #8, .LBB12_9
		; VBITS_GE_128-NEXT: .LBB12_25: // %cond.store15
		; VBITS_GE_128-NEXT: add x9, x2, #8
		; VBITS_GE_128-NEXT: st1 { v0.b }[8], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #9, .LBB12_10
		; VBITS_GE_128-NEXT: .LBB12_26: // %cond.store17
		; VBITS_GE_128-NEXT: add x9, x2, #9
		; VBITS_GE_128-NEXT: st1 { v0.b }[9], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #10, .LBB12_11
		; VBITS_GE_128-NEXT: .LBB12_27: // %cond.store19
		; VBITS_GE_128-NEXT: add x9, x2, #10
		; VBITS_GE_128-NEXT: st1 { v0.b }[10], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #11, .LBB12_12
		; VBITS_GE_128-NEXT: .LBB12_28: // %cond.store21
		; VBITS_GE_128-NEXT: add x9, x2, #11
		; VBITS_GE_128-NEXT: st1 { v0.b }[11], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #12, .LBB12_13
		; VBITS_GE_128-NEXT: .LBB12_29: // %cond.store23
		; VBITS_GE_128-NEXT: add x9, x2, #12
		; VBITS_GE_128-NEXT: st1 { v0.b }[12], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #13, .LBB12_14
		; VBITS_GE_128-NEXT: .LBB12_30: // %cond.store25
		; VBITS_GE_128-NEXT: add x9, x2, #13
		; VBITS_GE_128-NEXT: st1 { v0.b }[13], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #14, .LBB12_15
		; VBITS_GE_128-NEXT: .LBB12_31: // %cond.store27
		; VBITS_GE_128-NEXT: add x9, x2, #14
		; VBITS_GE_128-NEXT: st1 { v0.b }[14], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #15, .LBB12_16
		; VBITS_GE_128-NEXT: .LBB12_32: // %cond.store29
		; VBITS_GE_128-NEXT: add x8, x2, #15
		; VBITS_GE_128-NEXT: st1 { v0.b }[15], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_trunc_v16i32i8:		; VBITS_GE_256-LABEL: masked_store_trunc_v16i32i8:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #8		; VBITS_GE_256-NEXT: mov x8, #8
; VBITS_GE_256-NEXT: ptrue p0.s, vl8		; VBITS_GE_256-NEXT: ptrue p0.s, vl8
; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
Show All 28 Lines	; VBITS_GE_512-NEXT: ret
%b = load <16 x i32>, <16 x i32>* %bp		%b = load <16 x i32>, <16 x i32>* %bp
%mask = icmp eq <16 x i32> %a, %b		%mask = icmp eq <16 x i32> %a, %b
%val = trunc <16 x i32> %a to <16 x i8>		%val = trunc <16 x i32> %a to <16 x i8>
call void @llvm.masked.store.v16i8(<16 x i8> %val, <16 x i8>* %dest, i32 8, <16 x i1> %mask)		call void @llvm.masked.store.v16i8(<16 x i8> %val, <16 x i8>* %dest, i32 8, <16 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i16>* %dest) #0 {		define void @masked_store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i16>* %dest) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_trunc_v16i32i16:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q0, q5, [x1]
		; VBITS_GE_SVE_128-NEXT: mov x8, #8
		; VBITS_GE_SVE_128-NEXT: ptrue p0.h, vl8
		; VBITS_GE_SVE_128-NEXT: ldp q1, q2, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: ldp q3, q4, [x0]
		; VBITS_GE_SVE_128-NEXT: cmeq v0.4s, v3.4s, v0.4s
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: cmeq v5.4s, v4.4s, v5.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v3.8h, v3.8h, v4.8h
		; VBITS_GE_SVE_128-NEXT: uzp1 v0.8h, v0.8h, v5.8h
		; VBITS_GE_SVE_128-NEXT: cmeq v6.4s, v1.4s, v6.4s
		; VBITS_GE_SVE_128-NEXT: uzp1 v1.8h, v1.8h, v2.8h
		; VBITS_GE_SVE_128-NEXT: cmeq v7.4s, v2.4s, v7.4s
		; VBITS_GE_SVE_128-NEXT: cmpne p1.h, p0/z, z0.h, #0
		; VBITS_GE_SVE_128-NEXT: uzp1 v5.8h, v6.8h, v7.8h
		; VBITS_GE_SVE_128-NEXT: cmpne p0.h, p0/z, z5.h, #0
		; VBITS_GE_SVE_128-NEXT: st1h { z1.h }, p0, [x2, x8, lsl #1]
		; VBITS_GE_SVE_128-NEXT: st1h { z3.h }, p1, [x2]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_trunc_v16i32i16:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q1, q0, [x1]
		; VBITS_GE_128-NEXT: ldp q2, q3, [x0]
		; VBITS_GE_128-NEXT: cmeq v1.4s, v2.4s, v1.4s
		; VBITS_GE_128-NEXT: cmeq v4.4s, v3.4s, v0.4s
		; VBITS_GE_128-NEXT: ldp q6, q5, [x1, #32]
		; VBITS_GE_128-NEXT: uzp1 v4.8h, v1.8h, v4.8h
		; VBITS_GE_128-NEXT: uzp1 v2.8h, v2.8h, v3.8h
		; VBITS_GE_128-NEXT: xtn v4.8b, v4.8h
		; VBITS_GE_128-NEXT: ldp q1, q0, [x0, #32]
		; VBITS_GE_128-NEXT: umov w8, v4.b[1]
		; VBITS_GE_128-NEXT: umov w10, v4.b[2]
		; VBITS_GE_128-NEXT: umov w9, v4.b[0]
		; VBITS_GE_128-NEXT: umov w11, v4.b[3]
		; VBITS_GE_128-NEXT: umov w12, v4.b[4]
		; VBITS_GE_128-NEXT: umov w13, v4.b[5]
		; VBITS_GE_128-NEXT: cmeq v6.4s, v1.4s, v6.4s
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: cmeq v5.4s, v0.4s, v5.4s
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: uzp1 v5.8h, v6.8h, v5.8h
		; VBITS_GE_128-NEXT: bfi w9, w8, #1, #1
		; VBITS_GE_128-NEXT: umov w8, v4.b[6]
		; VBITS_GE_128-NEXT: bfi w9, w10, #2, #1
		; VBITS_GE_128-NEXT: umov w10, v4.b[7]
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: xtn v5.8b, v5.8h
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w13, #5, #1
		; VBITS_GE_128-NEXT: umov w11, v5.b[0]
		; VBITS_GE_128-NEXT: umov w12, v5.b[1]
		; VBITS_GE_128-NEXT: umov w13, v5.b[2]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w9, w8, lsl #6
		; VBITS_GE_128-NEXT: umov w9, v5.b[3]
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #7
		; VBITS_GE_128-NEXT: umov w10, v5.b[4]
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #8
		; VBITS_GE_128-NEXT: umov w11, v5.b[5]
		; VBITS_GE_128-NEXT: orr w8, w8, w12, lsl #9
		; VBITS_GE_128-NEXT: umov w12, v5.b[6]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w13, lsl #10
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #11
		; VBITS_GE_128-NEXT: umov w9, v5.b[7]
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #12
		; VBITS_GE_128-NEXT: and w10, w12, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #13
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #14
		; VBITS_GE_128-NEXT: orr w9, w8, w9, lsl #15
		; VBITS_GE_128-NEXT: and w8, w9, #0xffff
		; VBITS_GE_128-NEXT: tbnz w9, #0, .LBB13_17
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB13_18
		; VBITS_GE_128-NEXT: .LBB13_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB13_19
		; VBITS_GE_128-NEXT: .LBB13_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB13_20
		; VBITS_GE_128-NEXT: .LBB13_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB13_21
		; VBITS_GE_128-NEXT: .LBB13_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB13_22
		; VBITS_GE_128-NEXT: .LBB13_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB13_23
		; VBITS_GE_128-NEXT: .LBB13_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB13_24
		; VBITS_GE_128-NEXT: .LBB13_8: // %else14
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v1.8h, v0.8h
		; VBITS_GE_128-NEXT: tbnz w8, #8, .LBB13_25
		; VBITS_GE_128-NEXT: .LBB13_9: // %else16
		; VBITS_GE_128-NEXT: tbnz w8, #9, .LBB13_26
		; VBITS_GE_128-NEXT: .LBB13_10: // %else18
		; VBITS_GE_128-NEXT: tbnz w8, #10, .LBB13_27
		; VBITS_GE_128-NEXT: .LBB13_11: // %else20
		; VBITS_GE_128-NEXT: tbnz w8, #11, .LBB13_28
		; VBITS_GE_128-NEXT: .LBB13_12: // %else22
		; VBITS_GE_128-NEXT: tbnz w8, #12, .LBB13_29
		; VBITS_GE_128-NEXT: .LBB13_13: // %else24
		; VBITS_GE_128-NEXT: tbnz w8, #13, .LBB13_30
		; VBITS_GE_128-NEXT: .LBB13_14: // %else26
		; VBITS_GE_128-NEXT: tbnz w8, #14, .LBB13_31
		; VBITS_GE_128-NEXT: .LBB13_15: // %else28
		; VBITS_GE_128-NEXT: tbnz w8, #15, .LBB13_32
		; VBITS_GE_128-NEXT: .LBB13_16: // %else30
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB13_17: // %cond.store
		; VBITS_GE_128-NEXT: str h2, [x2]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB13_2
		; VBITS_GE_128-NEXT: .LBB13_18: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x2, #2
		; VBITS_GE_128-NEXT: st1 { v2.h }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB13_3
		; VBITS_GE_128-NEXT: .LBB13_19: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x2, #4
		; VBITS_GE_128-NEXT: st1 { v2.h }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB13_4
		; VBITS_GE_128-NEXT: .LBB13_20: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x2, #6
		; VBITS_GE_128-NEXT: st1 { v2.h }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB13_5
		; VBITS_GE_128-NEXT: .LBB13_21: // %cond.store7
		; VBITS_GE_128-NEXT: add x9, x2, #8
		; VBITS_GE_128-NEXT: st1 { v2.h }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB13_6
		; VBITS_GE_128-NEXT: .LBB13_22: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x2, #10
		; VBITS_GE_128-NEXT: st1 { v2.h }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB13_7
		; VBITS_GE_128-NEXT: .LBB13_23: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x2, #12
		; VBITS_GE_128-NEXT: st1 { v2.h }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB13_8
		; VBITS_GE_128-NEXT: .LBB13_24: // %cond.store13
		; VBITS_GE_128-NEXT: add x9, x2, #14
		; VBITS_GE_128-NEXT: st1 { v2.h }[7], [x9]
		; VBITS_GE_128-NEXT: uzp1 v0.8h, v1.8h, v0.8h
		; VBITS_GE_128-NEXT: tbz w8, #8, .LBB13_9
		; VBITS_GE_128-NEXT: .LBB13_25: // %cond.store15
		; VBITS_GE_128-NEXT: str h0, [x2, #16]
		; VBITS_GE_128-NEXT: tbz w8, #9, .LBB13_10
		; VBITS_GE_128-NEXT: .LBB13_26: // %cond.store17
		; VBITS_GE_128-NEXT: add x9, x2, #18
		; VBITS_GE_128-NEXT: st1 { v0.h }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #10, .LBB13_11
		; VBITS_GE_128-NEXT: .LBB13_27: // %cond.store19
		; VBITS_GE_128-NEXT: add x9, x2, #20
		; VBITS_GE_128-NEXT: st1 { v0.h }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #11, .LBB13_12
		; VBITS_GE_128-NEXT: .LBB13_28: // %cond.store21
		; VBITS_GE_128-NEXT: add x9, x2, #22
		; VBITS_GE_128-NEXT: st1 { v0.h }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #12, .LBB13_13
		; VBITS_GE_128-NEXT: .LBB13_29: // %cond.store23
		; VBITS_GE_128-NEXT: add x9, x2, #24
		; VBITS_GE_128-NEXT: st1 { v0.h }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #13, .LBB13_14
		; VBITS_GE_128-NEXT: .LBB13_30: // %cond.store25
		; VBITS_GE_128-NEXT: add x9, x2, #26
		; VBITS_GE_128-NEXT: st1 { v0.h }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #14, .LBB13_15
		; VBITS_GE_128-NEXT: .LBB13_31: // %cond.store27
		; VBITS_GE_128-NEXT: add x9, x2, #28
		; VBITS_GE_128-NEXT: st1 { v0.h }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #15, .LBB13_16
		; VBITS_GE_128-NEXT: .LBB13_32: // %cond.store29
		; VBITS_GE_128-NEXT: add x8, x2, #30
		; VBITS_GE_128-NEXT: st1 { v0.h }[7], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_trunc_v16i32i16:		; VBITS_GE_256-LABEL: masked_store_trunc_v16i32i16:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #8		; VBITS_GE_256-NEXT: mov x8, #8
; VBITS_GE_256-NEXT: ptrue p0.s, vl8		; VBITS_GE_256-NEXT: ptrue p0.s, vl8
; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
Show All 28 Lines	; VBITS_GE_512-NEXT: ret
%b = load <16 x i32>, <16 x i32>* %bp		%b = load <16 x i32>, <16 x i32>* %bp
%mask = icmp eq <16 x i32> %a, %b		%mask = icmp eq <16 x i32> %a, %b
%val = trunc <16 x i32> %a to <16 x i16>		%val = trunc <16 x i32> %a to <16 x i16>
call void @llvm.masked.store.v16i16(<16 x i16> %val, <16 x i16>* %dest, i32 8, <16 x i1> %mask)		call void @llvm.masked.store.v16i16(<16 x i16> %val, <16 x i16>* %dest, i32 8, <16 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i16>* %bp, <32 x i8>* %dest) #0 {		define void @masked_store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i16>* %bp, <32 x i8>* %dest) #0 {
		; VBITS_GE_SVE_128-LABEL: masked_store_trunc_v32i16i8:
		; VBITS_GE_SVE_128: // %bb.0:
		; VBITS_GE_SVE_128-NEXT: ldp q0, q5, [x1]
		; VBITS_GE_SVE_128-NEXT: mov w8, #16
		; VBITS_GE_SVE_128-NEXT: ptrue p0.b, vl16
		; VBITS_GE_SVE_128-NEXT: ldp q1, q2, [x0, #32]
		; VBITS_GE_SVE_128-NEXT: ldp q3, q4, [x0]
		; VBITS_GE_SVE_128-NEXT: cmeq v0.8h, v3.8h, v0.8h
		; VBITS_GE_SVE_128-NEXT: ldp q6, q7, [x1, #32]
		; VBITS_GE_SVE_128-NEXT: cmeq v5.8h, v4.8h, v5.8h
		; VBITS_GE_SVE_128-NEXT: uzp1 v3.16b, v3.16b, v4.16b
		; VBITS_GE_SVE_128-NEXT: uzp1 v0.16b, v0.16b, v5.16b
		; VBITS_GE_SVE_128-NEXT: cmeq v6.8h, v1.8h, v6.8h
		; VBITS_GE_SVE_128-NEXT: uzp1 v1.16b, v1.16b, v2.16b
		; VBITS_GE_SVE_128-NEXT: cmeq v7.8h, v2.8h, v7.8h
		; VBITS_GE_SVE_128-NEXT: cmpne p1.b, p0/z, z0.b, #0
		; VBITS_GE_SVE_128-NEXT: uzp1 v5.16b, v6.16b, v7.16b
		; VBITS_GE_SVE_128-NEXT: cmpne p0.b, p0/z, z5.b, #0
		; VBITS_GE_SVE_128-NEXT: st1b { z1.b }, p0, [x2, x8]
		; VBITS_GE_SVE_128-NEXT: st1b { z3.b }, p1, [x2]
		; VBITS_GE_SVE_128-NEXT: ret
		;
		; VBITS_GE_128-LABEL: masked_store_trunc_v32i16i8:
		; VBITS_GE_128: // %bb.0:
		; VBITS_GE_128-NEXT: sub sp, sp, #16
		; VBITS_GE_128-NEXT: .cfi_def_cfa_offset 16
		; VBITS_GE_128-NEXT: ldp q0, q1, [x0, #32]
		; VBITS_GE_128-NEXT: ldp q2, q4, [x1, #32]
		; VBITS_GE_128-NEXT: cmeq v5.8h, v0.8h, v2.8h
		; VBITS_GE_128-NEXT: xtn v5.8b, v5.8h
		; VBITS_GE_128-NEXT: cmeq v4.8h, v1.8h, v4.8h
		; VBITS_GE_128-NEXT: umov w8, v5.b[1]
		; VBITS_GE_128-NEXT: umov w9, v5.b[2]
		; VBITS_GE_128-NEXT: umov w10, v5.b[0]
		; VBITS_GE_128-NEXT: umov w11, v5.b[3]
		; VBITS_GE_128-NEXT: umov w12, v5.b[4]
		; VBITS_GE_128-NEXT: umov w13, v5.b[5]
		; VBITS_GE_128-NEXT: xtn v4.8b, v4.8h
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: umov w14, v5.b[6]
		; VBITS_GE_128-NEXT: ldp q3, q2, [x0]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: umov w15, v5.b[7]
		; VBITS_GE_128-NEXT: bfi w10, w8, #1, #1
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: umov w16, v4.b[0]
		; VBITS_GE_128-NEXT: bfi w10, w9, #2, #1
		; VBITS_GE_128-NEXT: and w13, w13, #0x1
		; VBITS_GE_128-NEXT: umov w8, v4.b[1]
		; VBITS_GE_128-NEXT: ldp q7, q6, [x1]
		; VBITS_GE_128-NEXT: bfi w10, w11, #3, #1
		; VBITS_GE_128-NEXT: umov w9, v4.b[2]
		; VBITS_GE_128-NEXT: and w14, w14, #0x1
		; VBITS_GE_128-NEXT: bfi w10, w12, #4, #1
		; VBITS_GE_128-NEXT: umov w11, v4.b[3]
		; VBITS_GE_128-NEXT: and w15, w15, #0x1
		; VBITS_GE_128-NEXT: cmeq v5.8h, v3.8h, v7.8h
		; VBITS_GE_128-NEXT: bfi w10, w13, #5, #1
		; VBITS_GE_128-NEXT: and w16, w16, #0x1
		; VBITS_GE_128-NEXT: orr w10, w10, w14, lsl #6
		; VBITS_GE_128-NEXT: xtn v5.8b, v5.8h
		; VBITS_GE_128-NEXT: and w8, w8, #0x1
		; VBITS_GE_128-NEXT: umov w12, v4.b[4]
		; VBITS_GE_128-NEXT: orr w10, w10, w15, lsl #7
		; VBITS_GE_128-NEXT: umov w13, v5.b[1]
		; VBITS_GE_128-NEXT: umov w14, v5.b[2]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: orr w10, w10, w16, lsl #8
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: orr w8, w10, w8, lsl #9
		; VBITS_GE_128-NEXT: orr w8, w8, w9, lsl #10
		; VBITS_GE_128-NEXT: and w10, w12, #0x1
		; VBITS_GE_128-NEXT: umov w9, v5.b[0]
		; VBITS_GE_128-NEXT: orr w8, w8, w11, lsl #11
		; VBITS_GE_128-NEXT: umov w11, v4.b[5]
		; VBITS_GE_128-NEXT: and w12, w13, #0x1
		; VBITS_GE_128-NEXT: and w13, w14, #0x1
		; VBITS_GE_128-NEXT: umov w14, v5.b[3]
		; VBITS_GE_128-NEXT: umov w15, v5.b[4]
		; VBITS_GE_128-NEXT: umov w16, v5.b[5]
		; VBITS_GE_128-NEXT: and w9, w9, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #12
		; VBITS_GE_128-NEXT: and w10, w11, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w12, #1, #1
		; VBITS_GE_128-NEXT: and w11, w14, #0x1
		; VBITS_GE_128-NEXT: umov w14, v5.b[6]
		; VBITS_GE_128-NEXT: and w12, w15, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w13, #2, #1
		; VBITS_GE_128-NEXT: cmeq v6.8h, v2.8h, v6.8h
		; VBITS_GE_128-NEXT: and w13, w16, #0x1
		; VBITS_GE_128-NEXT: bfi w9, w11, #3, #1
		; VBITS_GE_128-NEXT: umov w11, v5.b[7]
		; VBITS_GE_128-NEXT: xtn v5.8b, v6.8h
		; VBITS_GE_128-NEXT: bfi w9, w12, #4, #1
		; VBITS_GE_128-NEXT: umov w12, v4.b[6]
		; VBITS_GE_128-NEXT: bfi w9, w13, #5, #1
		; VBITS_GE_128-NEXT: and w13, w14, #0x1
		; VBITS_GE_128-NEXT: umov w14, v5.b[0]
		; VBITS_GE_128-NEXT: and w11, w11, #0x1
		; VBITS_GE_128-NEXT: umov w15, v5.b[1]
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #13
		; VBITS_GE_128-NEXT: orr w9, w9, w13, lsl #6
		; VBITS_GE_128-NEXT: and w10, w12, #0x1
		; VBITS_GE_128-NEXT: umov w12, v5.b[2]
		; VBITS_GE_128-NEXT: orr w9, w9, w11, lsl #7
		; VBITS_GE_128-NEXT: and w11, w14, #0x1
		; VBITS_GE_128-NEXT: umov w14, v5.b[3]
		; VBITS_GE_128-NEXT: and w13, w15, #0x1
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #14
		; VBITS_GE_128-NEXT: umov w10, v5.b[4]
		; VBITS_GE_128-NEXT: orr w9, w9, w11, lsl #8
		; VBITS_GE_128-NEXT: and w11, w12, #0x1
		; VBITS_GE_128-NEXT: umov w12, v5.b[5]
		; VBITS_GE_128-NEXT: orr w9, w9, w13, lsl #9
		; VBITS_GE_128-NEXT: and w13, w14, #0x1
		; VBITS_GE_128-NEXT: umov w14, v5.b[6]
		; VBITS_GE_128-NEXT: orr w9, w9, w11, lsl #10
		; VBITS_GE_128-NEXT: umov w11, v4.b[7]
		; VBITS_GE_128-NEXT: and w10, w10, #0x1
		; VBITS_GE_128-NEXT: orr w9, w9, w13, lsl #11
		; VBITS_GE_128-NEXT: and w12, w12, #0x1
		; VBITS_GE_128-NEXT: umov w13, v5.b[7]
		; VBITS_GE_128-NEXT: orr w9, w9, w10, lsl #12
		; VBITS_GE_128-NEXT: and w10, w14, #0x1
		; VBITS_GE_128-NEXT: orr w11, w8, w11, lsl #15
		; VBITS_GE_128-NEXT: orr w8, w9, w12, lsl #13
		; VBITS_GE_128-NEXT: orr w8, w8, w10, lsl #14
		; VBITS_GE_128-NEXT: orr w8, w8, w13, lsl #15
		; VBITS_GE_128-NEXT: uzp1 v2.16b, v3.16b, v2.16b
		; VBITS_GE_128-NEXT: bfi w8, w11, #16, #16
		; VBITS_GE_128-NEXT: tbnz w8, #0, .LBB14_33
		; VBITS_GE_128-NEXT: // %bb.1: // %else
		; VBITS_GE_128-NEXT: tbnz w8, #1, .LBB14_34
		; VBITS_GE_128-NEXT: .LBB14_2: // %else2
		; VBITS_GE_128-NEXT: tbnz w8, #2, .LBB14_35
		; VBITS_GE_128-NEXT: .LBB14_3: // %else4
		; VBITS_GE_128-NEXT: tbnz w8, #3, .LBB14_36
		; VBITS_GE_128-NEXT: .LBB14_4: // %else6
		; VBITS_GE_128-NEXT: tbnz w8, #4, .LBB14_37
		; VBITS_GE_128-NEXT: .LBB14_5: // %else8
		; VBITS_GE_128-NEXT: tbnz w8, #5, .LBB14_38
		; VBITS_GE_128-NEXT: .LBB14_6: // %else10
		; VBITS_GE_128-NEXT: tbnz w8, #6, .LBB14_39
		; VBITS_GE_128-NEXT: .LBB14_7: // %else12
		; VBITS_GE_128-NEXT: tbnz w8, #7, .LBB14_40
		; VBITS_GE_128-NEXT: .LBB14_8: // %else14
		; VBITS_GE_128-NEXT: tbnz w8, #8, .LBB14_41
		; VBITS_GE_128-NEXT: .LBB14_9: // %else16
		; VBITS_GE_128-NEXT: tbnz w8, #9, .LBB14_42
		; VBITS_GE_128-NEXT: .LBB14_10: // %else18
		; VBITS_GE_128-NEXT: tbnz w8, #10, .LBB14_43
		; VBITS_GE_128-NEXT: .LBB14_11: // %else20
		; VBITS_GE_128-NEXT: tbnz w8, #11, .LBB14_44
		; VBITS_GE_128-NEXT: .LBB14_12: // %else22
		; VBITS_GE_128-NEXT: tbnz w8, #12, .LBB14_45
		; VBITS_GE_128-NEXT: .LBB14_13: // %else24
		; VBITS_GE_128-NEXT: tbnz w8, #13, .LBB14_46
		; VBITS_GE_128-NEXT: .LBB14_14: // %else26
		; VBITS_GE_128-NEXT: tbnz w8, #14, .LBB14_47
		; VBITS_GE_128-NEXT: .LBB14_15: // %else28
		; VBITS_GE_128-NEXT: tbnz w8, #15, .LBB14_48
		; VBITS_GE_128-NEXT: .LBB14_16: // %else30
		; VBITS_GE_128-NEXT: uzp1 v0.16b, v0.16b, v1.16b
		; VBITS_GE_128-NEXT: tbnz w8, #16, .LBB14_49
		; VBITS_GE_128-NEXT: .LBB14_17: // %else32
		; VBITS_GE_128-NEXT: tbnz w8, #17, .LBB14_50
		; VBITS_GE_128-NEXT: .LBB14_18: // %else34
		; VBITS_GE_128-NEXT: tbnz w8, #18, .LBB14_51
		; VBITS_GE_128-NEXT: .LBB14_19: // %else36
		; VBITS_GE_128-NEXT: tbnz w8, #19, .LBB14_52
		; VBITS_GE_128-NEXT: .LBB14_20: // %else38
		; VBITS_GE_128-NEXT: tbnz w8, #20, .LBB14_53
		; VBITS_GE_128-NEXT: .LBB14_21: // %else40
		; VBITS_GE_128-NEXT: tbnz w8, #21, .LBB14_54
		; VBITS_GE_128-NEXT: .LBB14_22: // %else42
		; VBITS_GE_128-NEXT: tbnz w8, #22, .LBB14_55
		; VBITS_GE_128-NEXT: .LBB14_23: // %else44
		; VBITS_GE_128-NEXT: tbnz w8, #23, .LBB14_56
		; VBITS_GE_128-NEXT: .LBB14_24: // %else46
		; VBITS_GE_128-NEXT: tbnz w8, #24, .LBB14_57
		; VBITS_GE_128-NEXT: .LBB14_25: // %else48
		; VBITS_GE_128-NEXT: tbnz w8, #25, .LBB14_58
		; VBITS_GE_128-NEXT: .LBB14_26: // %else50
		; VBITS_GE_128-NEXT: tbnz w8, #26, .LBB14_59
		; VBITS_GE_128-NEXT: .LBB14_27: // %else52
		; VBITS_GE_128-NEXT: tbnz w8, #27, .LBB14_60
		; VBITS_GE_128-NEXT: .LBB14_28: // %else54
		; VBITS_GE_128-NEXT: tbnz w8, #28, .LBB14_61
		; VBITS_GE_128-NEXT: .LBB14_29: // %else56
		; VBITS_GE_128-NEXT: tbnz w8, #29, .LBB14_62
		; VBITS_GE_128-NEXT: .LBB14_30: // %else58
		; VBITS_GE_128-NEXT: tbnz w8, #30, .LBB14_63
		; VBITS_GE_128-NEXT: .LBB14_31: // %else60
		; VBITS_GE_128-NEXT: tbnz w8, #31, .LBB14_64
		; VBITS_GE_128-NEXT: .LBB14_32: // %else62
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		; VBITS_GE_128-NEXT: .LBB14_33: // %cond.store
		; VBITS_GE_128-NEXT: st1 { v2.b }[0], [x2]
		; VBITS_GE_128-NEXT: tbz w8, #1, .LBB14_2
		; VBITS_GE_128-NEXT: .LBB14_34: // %cond.store1
		; VBITS_GE_128-NEXT: add x9, x2, #1
		; VBITS_GE_128-NEXT: st1 { v2.b }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #2, .LBB14_3
		; VBITS_GE_128-NEXT: .LBB14_35: // %cond.store3
		; VBITS_GE_128-NEXT: add x9, x2, #2
		; VBITS_GE_128-NEXT: st1 { v2.b }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #3, .LBB14_4
		; VBITS_GE_128-NEXT: .LBB14_36: // %cond.store5
		; VBITS_GE_128-NEXT: add x9, x2, #3
		; VBITS_GE_128-NEXT: st1 { v2.b }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #4, .LBB14_5
		; VBITS_GE_128-NEXT: .LBB14_37: // %cond.store7
		; VBITS_GE_128-NEXT: add x9, x2, #4
		; VBITS_GE_128-NEXT: st1 { v2.b }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #5, .LBB14_6
		; VBITS_GE_128-NEXT: .LBB14_38: // %cond.store9
		; VBITS_GE_128-NEXT: add x9, x2, #5
		; VBITS_GE_128-NEXT: st1 { v2.b }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #6, .LBB14_7
		; VBITS_GE_128-NEXT: .LBB14_39: // %cond.store11
		; VBITS_GE_128-NEXT: add x9, x2, #6
		; VBITS_GE_128-NEXT: st1 { v2.b }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #7, .LBB14_8
		; VBITS_GE_128-NEXT: .LBB14_40: // %cond.store13
		; VBITS_GE_128-NEXT: add x9, x2, #7
		; VBITS_GE_128-NEXT: st1 { v2.b }[7], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #8, .LBB14_9
		; VBITS_GE_128-NEXT: .LBB14_41: // %cond.store15
		; VBITS_GE_128-NEXT: add x9, x2, #8
		; VBITS_GE_128-NEXT: st1 { v2.b }[8], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #9, .LBB14_10
		; VBITS_GE_128-NEXT: .LBB14_42: // %cond.store17
		; VBITS_GE_128-NEXT: add x9, x2, #9
		; VBITS_GE_128-NEXT: st1 { v2.b }[9], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #10, .LBB14_11
		; VBITS_GE_128-NEXT: .LBB14_43: // %cond.store19
		; VBITS_GE_128-NEXT: add x9, x2, #10
		; VBITS_GE_128-NEXT: st1 { v2.b }[10], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #11, .LBB14_12
		; VBITS_GE_128-NEXT: .LBB14_44: // %cond.store21
		; VBITS_GE_128-NEXT: add x9, x2, #11
		; VBITS_GE_128-NEXT: st1 { v2.b }[11], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #12, .LBB14_13
		; VBITS_GE_128-NEXT: .LBB14_45: // %cond.store23
		; VBITS_GE_128-NEXT: add x9, x2, #12
		; VBITS_GE_128-NEXT: st1 { v2.b }[12], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #13, .LBB14_14
		; VBITS_GE_128-NEXT: .LBB14_46: // %cond.store25
		; VBITS_GE_128-NEXT: add x9, x2, #13
		; VBITS_GE_128-NEXT: st1 { v2.b }[13], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #14, .LBB14_15
		; VBITS_GE_128-NEXT: .LBB14_47: // %cond.store27
		; VBITS_GE_128-NEXT: add x9, x2, #14
		; VBITS_GE_128-NEXT: st1 { v2.b }[14], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #15, .LBB14_16
		; VBITS_GE_128-NEXT: .LBB14_48: // %cond.store29
		; VBITS_GE_128-NEXT: add x9, x2, #15
		; VBITS_GE_128-NEXT: st1 { v2.b }[15], [x9]
		; VBITS_GE_128-NEXT: uzp1 v0.16b, v0.16b, v1.16b
		; VBITS_GE_128-NEXT: tbz w8, #16, .LBB14_17
		; VBITS_GE_128-NEXT: .LBB14_49: // %cond.store31
		; VBITS_GE_128-NEXT: add x9, x2, #16
		; VBITS_GE_128-NEXT: st1 { v0.b }[0], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #17, .LBB14_18
		; VBITS_GE_128-NEXT: .LBB14_50: // %cond.store33
		; VBITS_GE_128-NEXT: add x9, x2, #17
		; VBITS_GE_128-NEXT: st1 { v0.b }[1], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #18, .LBB14_19
		; VBITS_GE_128-NEXT: .LBB14_51: // %cond.store35
		; VBITS_GE_128-NEXT: add x9, x2, #18
		; VBITS_GE_128-NEXT: st1 { v0.b }[2], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #19, .LBB14_20
		; VBITS_GE_128-NEXT: .LBB14_52: // %cond.store37
		; VBITS_GE_128-NEXT: add x9, x2, #19
		; VBITS_GE_128-NEXT: st1 { v0.b }[3], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #20, .LBB14_21
		; VBITS_GE_128-NEXT: .LBB14_53: // %cond.store39
		; VBITS_GE_128-NEXT: add x9, x2, #20
		; VBITS_GE_128-NEXT: st1 { v0.b }[4], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #21, .LBB14_22
		; VBITS_GE_128-NEXT: .LBB14_54: // %cond.store41
		; VBITS_GE_128-NEXT: add x9, x2, #21
		; VBITS_GE_128-NEXT: st1 { v0.b }[5], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #22, .LBB14_23
		; VBITS_GE_128-NEXT: .LBB14_55: // %cond.store43
		; VBITS_GE_128-NEXT: add x9, x2, #22
		; VBITS_GE_128-NEXT: st1 { v0.b }[6], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #23, .LBB14_24
		; VBITS_GE_128-NEXT: .LBB14_56: // %cond.store45
		; VBITS_GE_128-NEXT: add x9, x2, #23
		; VBITS_GE_128-NEXT: st1 { v0.b }[7], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #24, .LBB14_25
		; VBITS_GE_128-NEXT: .LBB14_57: // %cond.store47
		; VBITS_GE_128-NEXT: add x9, x2, #24
		; VBITS_GE_128-NEXT: st1 { v0.b }[8], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #25, .LBB14_26
		; VBITS_GE_128-NEXT: .LBB14_58: // %cond.store49
		; VBITS_GE_128-NEXT: add x9, x2, #25
		; VBITS_GE_128-NEXT: st1 { v0.b }[9], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #26, .LBB14_27
		; VBITS_GE_128-NEXT: .LBB14_59: // %cond.store51
		; VBITS_GE_128-NEXT: add x9, x2, #26
		; VBITS_GE_128-NEXT: st1 { v0.b }[10], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #27, .LBB14_28
		; VBITS_GE_128-NEXT: .LBB14_60: // %cond.store53
		; VBITS_GE_128-NEXT: add x9, x2, #27
		; VBITS_GE_128-NEXT: st1 { v0.b }[11], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #28, .LBB14_29
		; VBITS_GE_128-NEXT: .LBB14_61: // %cond.store55
		; VBITS_GE_128-NEXT: add x9, x2, #28
		; VBITS_GE_128-NEXT: st1 { v0.b }[12], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #29, .LBB14_30
		; VBITS_GE_128-NEXT: .LBB14_62: // %cond.store57
		; VBITS_GE_128-NEXT: add x9, x2, #29
		; VBITS_GE_128-NEXT: st1 { v0.b }[13], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #30, .LBB14_31
		; VBITS_GE_128-NEXT: .LBB14_63: // %cond.store59
		; VBITS_GE_128-NEXT: add x9, x2, #30
		; VBITS_GE_128-NEXT: st1 { v0.b }[14], [x9]
		; VBITS_GE_128-NEXT: tbz w8, #31, .LBB14_32
		; VBITS_GE_128-NEXT: .LBB14_64: // %cond.store61
		; VBITS_GE_128-NEXT: add x8, x2, #31
		; VBITS_GE_128-NEXT: st1 { v0.b }[15], [x8]
		; VBITS_GE_128-NEXT: add sp, sp, #16
		; VBITS_GE_128-NEXT: ret
		;
; VBITS_GE_256-LABEL: masked_store_trunc_v32i16i8:		; VBITS_GE_256-LABEL: masked_store_trunc_v32i16i8:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #16		; VBITS_GE_256-NEXT: mov x8, #16
; VBITS_GE_256-NEXT: ptrue p0.h, vl16		; VBITS_GE_256-NEXT: ptrue p0.h, vl16
; VBITS_GE_256-NEXT: ld1h { z0.h }, p0/z, [x0, x8, lsl #1]		; VBITS_GE_256-NEXT: ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
; VBITS_GE_256-NEXT: ld1h { z1.h }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1h { z1.h }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1h { z2.h }, p0/z, [x1, x8, lsl #1]		; VBITS_GE_256-NEXT: ld1h { z2.h }, p0/z, [x1, x8, lsl #1]
; VBITS_GE_256-NEXT: ld1h { z3.h }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1h { z3.h }, p0/z, [x1]
Show All 31 Lines

declare void @llvm.masked.store.v2f16(<2 x half>, <2 x half>*, i32, <2 x i1>)		declare void @llvm.masked.store.v2f16(<2 x half>, <2 x half>*, i32, <2 x i1>)
declare void @llvm.masked.store.v2f32(<2 x float>, <2 x float>*, i32, <2 x i1>)		declare void @llvm.masked.store.v2f32(<2 x float>, <2 x float>*, i32, <2 x i1>)
declare void @llvm.masked.store.v4f32(<4 x float>, <4 x float>*, i32, <4 x i1>)		declare void @llvm.masked.store.v4f32(<4 x float>, <4 x float>*, i32, <4 x i1>)
declare void @llvm.masked.store.v8f32(<8 x float>, <8 x float>*, i32, <8 x i1>)		declare void @llvm.masked.store.v8f32(<8 x float>, <8 x float>*, i32, <8 x i1>)
declare void @llvm.masked.store.v16f32(<16 x float>, <16 x float>*, i32, <16 x i1>)		declare void @llvm.masked.store.v16f32(<16 x float>, <16 x float>*, i32, <16 x i1>)
declare void @llvm.masked.store.v32f32(<32 x float>, <32 x float>*, i32, <32 x i1>)		declare void @llvm.masked.store.v32f32(<32 x float>, <32 x float>*, i32, <32 x i1>)
declare void @llvm.masked.store.v64f32(<64 x float>, <64 x float>*, i32, <64 x i1>)		declare void @llvm.masked.store.v64f32(<64 x float>, <64 x float>*, i32, <64 x i1>)
		declare void @llvm.masked.store.v2f64(<2 x double>, <2 x double>*, i32, <2 x i1>)

declare void @llvm.masked.store.v8i8(<8 x i8>, <8 x i8>*, i32, <8 x i1>)		declare void @llvm.masked.store.v8i8(<8 x i8>, <8 x i8>*, i32, <8 x i1>)
declare void @llvm.masked.store.v8i16(<8 x i16>, <8 x i16>*, i32, <8 x i1>)		declare void @llvm.masked.store.v8i16(<8 x i16>, <8 x i16>*, i32, <8 x i1>)
declare void @llvm.masked.store.v8i32(<8 x i32>, <8 x i32>*, i32, <8 x i1>)		declare void @llvm.masked.store.v8i32(<8 x i32>, <8 x i32>*, i32, <8 x i1>)
declare void @llvm.masked.store.v16i8(<16 x i8>, <16 x i8>*, i32, <16 x i1>)		declare void @llvm.masked.store.v16i8(<16 x i8>, <16 x i8>*, i32, <16 x i1>)
declare void @llvm.masked.store.v16i16(<16 x i16>, <16 x i16>*, i32, <16 x i1>)		declare void @llvm.masked.store.v16i16(<16 x i16>, <16 x i16>*, i32, <16 x i1>)
declare void @llvm.masked.store.v32i8(<32 x i8>, <32 x i8>*, i32, <32 x i1>)		declare void @llvm.masked.store.v32i8(<32 x i8>, <32 x i8>*, i32, <32 x i1>)

attributes #0 = { "target-features"="+sve" }		attributes #0 = { "target-features"="+sve" }
		kmclaughlinUnsubmitted Done Reply Inline Actions nit: extra whitespace :) kmclaughlin: nit: extra whitespace :)

llvm/test/CodeGen/AArch64/sve-masked-load-store.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc --force-sve-128bit-vector < %s \| FileCheck %s

				target triple = "aarch64-unknown-linux-gnu"

				; load v16i8
				define <16 x i8> @masked_load_v16i8(<16 x i8>* %src, <16 x i1> %mask) vscale_range(1,16) #0 {
				; CHECK-LABEL: masked_load_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: shl v0.16b, v0.16b, #7
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: cmlt v0.16b, v0.16b, #0
				; CHECK-NEXT: cmpne p0.b, p0/z, z0.b, #0
				; CHECK-NEXT: ld1b { z0.b }, p0/z, [x0]
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%load = call <16 x i8> @llvm.masked.load.v16i8(<16 x i8>* %src, i32 8, <16 x i1> %mask, <16 x i8> zeroinitializer)
				ret <16 x i8> %load
				}
				; store v16i8
				define void @masked_store_v16i8(<16 x i8>* %dst, <16 x i1> %mask) vscale_range(1,16) #0 {
				; CHECK-LABEL: masked_store_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: shl v0.16b, v0.16b, #7
				; CHECK-NEXT: ptrue p0.b, vl16
				; CHECK-NEXT: movi v1.2d, #0000000000000000
				; CHECK-NEXT: cmlt v0.16b, v0.16b, #0
				; CHECK-NEXT: cmpne p0.b, p0/z, z0.b, #0
				; CHECK-NEXT: st1b { z1.b }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.v16i8(<16 x i8> zeroinitializer, <16 x i8>* %dst, i32 8, <16 x i1> %mask)
				ret void
				}

				; load 4xfloat
				define <4 x float> @masked_load_v4f32(<4 x float>* %src, <4 x i1> %mask) vscale_range(1,16) #0 {
				; CHECK-LABEL: masked_load_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: shl v0.4s, v0.4s, #31
				; CHECK-NEXT: cmlt v0.4s, v0.4s, #0
				; CHECK-NEXT: cmpne p0.s, p0/z, z0.s, #0
				; CHECK-NEXT: ld1w { z0.s }, p0/z, [x0]
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%load = call <4 x float> @llvm.masked.load.v4f32(<4 x float>* %src, i32 8, <4 x i1> %mask, <4 x float> zeroinitializer)
				ret <4 x float> %load
				}

				; store v4f32
				define void @masked_store_v4f32(<4 x float>* %dst, <4 x i1> %mask) vscale_range(1,16) #0 {
				; CHECK-LABEL: masked_store_v4f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-NEXT: ptrue p0.s, vl4
				; CHECK-NEXT: movi v1.2d, #0000000000000000
				; CHECK-NEXT: shl v0.4s, v0.4s, #31
				; CHECK-NEXT: cmlt v0.4s, v0.4s, #0
				; CHECK-NEXT: cmpne p0.s, p0/z, z0.s, #0
				; CHECK-NEXT: st1w { z1.s }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.v4f32(<4 x float> zeroinitializer, <4 x float>* %dst, i32 8, <4 x i1> %mask)
				ret void
				}

				; load v2f64
				define <2 x double> @masked_load_v2f64(<2 x double>* %src, <2 x i1> %mask) vscale_range(1,16) #0 {
				; CHECK-LABEL: masked_load_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ushll v0.2d, v0.2s, #0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: shl v0.2d, v0.2d, #63
				; CHECK-NEXT: cmlt v0.2d, v0.2d, #0
				; CHECK-NEXT: cmpne p0.d, p0/z, z0.d, #0
				; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0]
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%load = call <2 x double> @llvm.masked.load.v2f64(<2 x double>* %src, i32 8, <2 x i1> %mask, <2 x double> zeroinitializer)
				ret <2 x double> %load
				}

				; store v2f64
				define void @masked_store_v2f64(<2 x double>* %dst, <2 x i1> %mask) vscale_range(1,16) #0 {
				; CHECK-LABEL: masked_store_v2f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ushll v0.2d, v0.2s, #0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: movi v1.2d, #0000000000000000
				; CHECK-NEXT: shl v0.2d, v0.2d, #63
				; CHECK-NEXT: cmlt v0.2d, v0.2d, #0
				; CHECK-NEXT: cmpne p0.d, p0/z, z0.d, #0
				; CHECK-NEXT: st1d { z1.d }, p0, [x0]
				; CHECK-NEXT: ret
				call void @llvm.masked.store.v2f64(<2 x double> zeroinitializer, <2 x double>* %dst, i32 8, <2 x i1> %mask)
				ret void
				}

				attributes #0 = { "target-features"="+sve" }

				declare <16 x i8> @llvm.masked.load.v16i8(<16 x i8>*, i32, <16 x i1>, <16 x i8>)
				declare void @llvm.masked.store.v16i8(<16 x i8>, <16 x i8>*, i32, <16 x i1>)

				declare <4 x float> @llvm.masked.load.v4f32(<4 x float>*, i32, <4 x i1>, <4 x float>)
				declare void @llvm.masked.store.v4f32(<4 x float>, <4 x float>*, i32, <4 x i1>)

				declare <2 x double> @llvm.masked.load.v2f64(<2 x double>*, i32, <2 x i1>, <2 x double>)
				declare void @llvm.masked.store.v2f64(<2 x double>, <2 x double>*, i32, <2 x i1>)

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64]: Force generating code compatible to streaming modeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458713

llvm/lib/Target/AArch64/AArch64Subtarget.h

llvm/lib/Target/AArch64/AArch64Subtarget.cpp

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll

llvm/test/CodeGen/AArch64/sve-masked-load-store.ll

[AArch64]: Force generating code compatible to streaming mode
ClosedPublic