This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
6/8
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
4/5
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
sve-fixed-length-masked-stores.ll

Differential D108115

[DAG][sve] Lowering for VLS masked truncating stores
ClosedPublic

Authored by DavidTruby on Aug 16 2021, 3:39 AM.

Download Raw Diff

Details

Reviewers

dmgreen
bsmith
efriedma
SjoerdMeijer
peterwaller-arm
paulwalker-arm
craig.topper
RKSimon

Commits

rG5c9684704d15: [DAG][sve] Lowering for VLS masked truncating stores

Summary

This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > MLIR.Target/LLVMIR::llvmir-intrinsics.mlir

Event Timeline

DavidTruby created this revision.Aug 16 2021, 3:39 AM

Herald added a reviewer: efriedma. · View Herald TranscriptAug 16 2021, 3:39 AM

Herald added subscribers: ctetreau, ecnelises, psnobl and 2 others. · View Herald Transcript

DavidTruby requested review of this revision.Aug 16 2021, 3:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 3:39 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

DavidTruby added a reviewer: SjoerdMeijer.Aug 16 2021, 3:39 AM

DavidTruby added inline comments.

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	These changes aren't correct, I have pinged @dmgreen and @SjoerdMeijer for review to see if they know what's happening here. My suspicion is that there's a target specific combine in Thumb2 for masked truncating stores that my general combine is blocking; I guess the easy fix here is to add an exemption for Thumb2 in the TLI hook but I'm not sure if that's the right thing to do?

Harbormaster completed remote builds in B119675: Diff 366581.Aug 16 2021, 4:01 AM

peterwaller-arm added a reviewer: peterwaller-arm.Aug 17 2021, 5:36 AM

Matt added a subscriber: Matt.Aug 17 2021, 9:24 AM

ping

SjoerdMeijer added inline comments.Sep 7 2021, 6:40 AM

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	Sorry for the delay! Yeah, this extra codegen is not ideal (I think we agree this is a perf regression, not a correctness, but that aside). If we could avoid this that would be best. As I understood, it would involve adding this target hook to the ARM backend: virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT, bool LegalOperations) const override { ... } Would you mind adding this to ARM backend? I have no preference if you include that here (I think that would make sense), or do that separately and create a dependent patch. But I think it would good if we can see the evidence this regression disappears before committing this.

Sorry I didn't see this. I get a lot of phabricator spam and this wasn't very obvious from the title. It should probably have [DAG], [llvm] doesn't say much when the whole project is llvm ;)

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9822	I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store?
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	We don't want canCombineTruncStore. I think you need to add demanded bits for the truncating masked store. As in https://github.com/llvm/llvm-project/blob/b50a60c234433545fc1c9b39f193373f560ea869/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18115

DavidTruby added inline comments.Sep 7 2021, 7:22 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9822	I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores.

Fix Thumb2 regression

DavidTruby added inline comments.Sep 7 2021, 8:42 AM

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	Thanks for the pointer, adding this seems to have fixed the regression

Harbormaster completed remote builds in B122883: Diff 371092.Sep 7 2021, 9:52 AM

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9868	Is this tested anywhere?

Remove unused and untested code

In D108115#3001669, @dmgreen wrote:

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

Harbormaster completed remote builds in B124426: Diff 373262.Sep 17 2021, 10:50 AM

I've reverted 734708e04f84b72f1ae7c8b35c002b8bf97dc064 to fix the 2 stage vls bot while this is in review. Please reland both when this is approved.

peterwaller-arm added a reviewer: paulwalker-arm.Oct 4 2021, 6:37 AM

DavidTruby retitled this revision from [llvm][sve] Lowering for VLS masked truncating stores to [DAG][sve] Lowering for VLS masked truncating stores.Oct 7 2021, 3:17 AM

Sorry for the delay. I missed this as something I should be looking at.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

OK Fair enough. MVE at least takes a separate route to turning masked stores to truncating stores, but has less legal types to deal with. We can treat the MVE test that improved between revisions as the tests.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9828	This formatting looks off.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161	Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again?

In D108115#3048051, @dmgreen wrote:

Sorry for the delay. I missed this as something I should be looking at.

No problem, thanks for the review.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I see what you mean about this one but I don't see an obvious way around it other than replicating all the getTruncStoreAction etc logic for masked stores separately. This might be the right thing to do in the end though? I'm open to other suggestions as doing that might be a bit of a pain!

RKSimon added inline comments.Oct 12 2021, 9:33 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
16	Any chance that this can be cleaned up and you use the update_tests script? The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback.

I've opened a separate patch to introduce the functionality for masked truncating store actions, which I will rebase this patch on when that is accepted. You can find that patch here https://reviews.llvm.org/D112536

dmgreen mentioned this in D112536: [DAG] Add functionality for masked truncating store actions.Oct 27 2021, 12:55 AM

dmgreen added inline comments.Oct 28 2021, 12:10 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161	@paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine.

paulwalker-arm added inline comments.Nov 26 2021, 9:50 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9843	This will fail once D114580 lands because that patch breaks the symmetry between truncating stores and masked truncating stores. If we can drop the `ISD::FP_ROUND` part of the combine then great otherwise we either need to add support for floating point to this patch or wait for D112536.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18163	For consistency this should be `ISD::SIGN_EXTEND` given on AArch64 fixed length masks are either zero or all ones.

Don't allow FP_ROUND

Change zero_extend to sign_extend.

DavidTruby marked 2 inline comments as done.Dec 9 2021, 7:02 AM

DavidTruby added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9843	I think for now we should land this without the FP_ROUND condition to get the code improvement in for LLVM 14 and then fix it up correctly in D112536 (which unfortunately needs a fair bit more work)

Harbormaster completed remote builds in B138440: Diff 393152.Dec 9 2021, 8:07 AM

RKSimon resigned from this revision.Dec 9 2021, 12:29 PM

DavidTruby marked 3 inline comments as done.Dec 13 2021, 4:13 AM

DavidTruby added inline comments.Dec 13 2021, 4:30 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18161	I think the reason we need to extend here is that fixed-length i1 vectors are not legal types, so the following function needs it to be a non-i1 vector to work correctly.

DavidTruby marked 2 inline comments as done.Dec 13 2021, 5:32 AM

peterwaller-arm accepted this revision.Dec 13 2021, 7:01 AM

This revision is now accepted and ready to land.Dec 13 2021, 7:01 AM

Sorry for the late review @DavidTruby. Although I believe the patch as is likely works fine I've spotted an inconsistency that if possible would good to fix.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9849	I have a comment below where I'm suggesting passing the original mask unmodified is likely an error.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18160–18167	On reflection I think this might be a bug introduced by the change to `DAGCombiner::visitMSTORE`. I say "might" because the requirements for `MSTORE`'s mask is not well documented. Typically a mask's type is linked to the main datatype of the operation. In this instance the MSTORE's main datatype is the value being stored and thus I believe the mask's VT should always be the result of `getSetCCResultType(Value->getValueType())`. This is something you can see the legaliser honouring, where it uses the helper function `DAGTypeLegalizer::PromoteTargetBoolean`. I think visitMSTORE should be using similar logic when folding in a truncate.

Fix mask type in DAGCombiner::visitMSTORE

paulwalker-arm added inline comments.Dec 15 2021, 4:42 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9849	You'll need to follow the same idiom as `DAGTypeLegalizer::PromoteTargetBoolean` because although `ISD::SIGN_EXTEND` is correct for AArch64, other target's might need something else.

Harbormaster completed remote builds in B139408: Diff 394520.Dec 15 2021, 4:46 AM

Call promoteTargetBoolean instead of using SIGN_EXTEND for mask.

Harbormaster completed remote builds in B139416: Diff 394534.Dec 15 2021, 6:17 AM

paulwalker-arm accepted this revision.Dec 17 2021, 5:39 AM

This revision was landed with ongoing or failed builds.Dec 17 2021, 7:05 AM

Closed by commit rG5c9684704d15: [DAG][sve] Lowering for VLS masked truncating stores (authored by DavidTruby). · Explain Why

This revision was automatically updated to reflect the committed changes.

DavidTruby added a commit: rG5c9684704d15: [DAG][sve] Lowering for VLS masked truncating stores.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

36 lines

Target/

AArch64/

AArch64ISelLowering.cpp

11 lines

test/

CodeGen/

AArch64/

sve-fixed-length-masked-stores.ll

58 lines

Diff 373262

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,791 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitMSCATTER(SDNode *N) {

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitMSTORE(SDNode *N) {		SDValue DAGCombiner::visitMSTORE(SDNode *N) {
MaskedStoreSDNode *MST = cast<MaskedStoreSDNode>(N);		MaskedStoreSDNode *MST = cast<MaskedStoreSDNode>(N);
SDValue Mask = MST->getMask();		SDValue Mask = MST->getMask();
SDValue Chain = MST->getChain();		SDValue Chain = MST->getChain();
		SDValue Value = MST->getValue();
		SDValue Ptr = MST->getBasePtr();
SDLoc DL(N);		SDLoc DL(N);

// Zap masked stores with a zero mask.		// Zap masked stores with a zero mask.
if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))		if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))
return Chain;		return Chain;

// If this is a masked load with an all ones mask, we can use a unmasked load.		// If this is a masked load with an all ones mask, we can use a unmasked load.
// FIXME: Can we do this for indexed, compressing, or truncating stores?		// FIXME: Can we do this for indexed, compressing, or truncating stores?
if (ISD::isConstantSplatVectorAllOnes(Mask.getNode()) &&		if (ISD::isConstantSplatVectorAllOnes(Mask.getNode()) &&
MST->isUnindexed() && !MST->isCompressingStore() &&		MST->isUnindexed() && !MST->isCompressingStore() &&
!MST->isTruncatingStore())		!MST->isTruncatingStore())
return DAG.getStore(MST->getChain(), SDLoc(N), MST->getValue(),		return DAG.getStore(MST->getChain(), SDLoc(N), MST->getValue(),
MST->getBasePtr(), MST->getMemOperand());		MST->getBasePtr(), MST->getMemOperand());

// Try transforming N to an indexed store.		// Try transforming N to an indexed store.
if (CombineToPreIndexedLoadStore(N) \|\| CombineToPostIndexedLoadStore(N))		if (CombineToPreIndexedLoadStore(N) \|\| CombineToPostIndexedLoadStore(N))
return SDValue(N, 0);		return SDValue(N, 0);

		if (MST->isTruncatingStore() && MST->isUnindexed() &&
		Value.getValueType().isInteger() &&
		(!isa<ConstantSDNode>(Value) \|\|
		dmgreenUnsubmitted Done Reply Inline Actions I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store? dmgreen: I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores. DavidTruby: I just took this from the existing non-masked truncating store code. I think it gets handled by…
		!cast<ConstantSDNode>(Value)->isOpaque())) {
		APInt TruncDemandedBits =
		APInt::getLowBitsSet(Value.getScalarValueSizeInBits(),
		MST->getMemoryVT().getScalarSizeInBits());

		// See if we can simplify the operation with
		dmgreenUnsubmitted Done Reply Inline Actions This formatting looks off. dmgreen: This formatting looks off.
		// SimplifyDemandedBits, which only works if the value has a single use.
		if (SimplifyDemandedBits(Value, TruncDemandedBits)) {
		// Re-visit the store if anything changed and the store hasn't been merged
		// with another node (N is deleted) SimplifyDemandedBits will add Value's
		// node back to the worklist if necessary, but we also need to re-visit
		// the Store node itself.
		if (N->getOpcode() != ISD::DELETED_NODE)
		AddToWorklist(N);
		return SDValue(N, 0);
		}
		}

		// If this is an FP_ROUND or TRUNC followed by a store, fold this into a
		// truncating store. We can do this even if this is already a truncstore.
		if ((Value.getOpcode() == ISD::FP_ROUND \|\|
		paulwalker-armUnsubmitted Done Reply Inline Actions This will fail once D114580 lands because that patch breaks the symmetry between truncating stores and masked truncating stores. If we can drop the `ISD::FP_ROUND` part of the combine then great otherwise we either need to add support for floating point to this patch or wait for D112536. paulwalker-arm: This will fail once D114580 lands because that patch breaks the symmetry between truncating…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I think for now we should land this without the FP_ROUND condition to get the code improvement in for LLVM 14 and then fix it up correctly in D112536 (which unfortunately needs a fair bit more work) DavidTruby: I think for now we should land this without the FP_ROUND condition to get the code improvement…
		Value.getOpcode() == ISD::TRUNCATE) &&
		Value.getNode()->hasOneUse() && MST->isUnindexed() &&
		TLI.canCombineTruncStore(Value.getOperand(0).getValueType(),
		MST->getMemoryVT(), LegalOperations)) {
		return DAG.getMaskedStore(Chain, SDLoc(N), Value.getOperand(0), Ptr,
		MST->getOffset(), MST->getMask(),
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I have a comment below where I'm suggesting passing the original mask unmodified is likely an error. paulwalker-arm: I have a comment below where I'm suggesting passing the original mask unmodified is likely an…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions You'll need to follow the same idiom as `DAGTypeLegalizer::PromoteTargetBoolean` because although `ISD::SIGN_EXTEND` is correct for AArch64, other target's might need something else. paulwalker-arm: You'll need to follow the same idiom as `DAGTypeLegalizer::PromoteTargetBoolean` because…
		MST->getMemoryVT(), MST->getMemOperand(),
		MST->getAddressingMode(), /IsTruncating=/true);
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitMGATHER(SDNode *N) {		SDValue DAGCombiner::visitMGATHER(SDNode *N) {
MaskedGatherSDNode *MGT = cast<MaskedGatherSDNode>(N);		MaskedGatherSDNode *MGT = cast<MaskedGatherSDNode>(N);
SDValue Mask = MGT->getMask();		SDValue Mask = MGT->getMask();
SDValue Chain = MGT->getChain();		SDValue Chain = MGT->getChain();
SDValue Index = MGT->getIndex();		SDValue Index = MGT->getIndex();
SDValue Scale = MGT->getScale();		SDValue Scale = MGT->getScale();
SDValue PassThru = MGT->getPassThru();		SDValue PassThru = MGT->getPassThru();
SDValue BasePtr = MGT->getBasePtr();		SDValue BasePtr = MGT->getBasePtr();
SDLoc DL(N);		SDLoc DL(N);

// Zap gathers with a zero mask.		// Zap gathers with a zero mask.
if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))		if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))
		dmgreenUnsubmitted Done Reply Inline Actions Is this tested anywhere? dmgreen: Is this tested anywhere?
return CombineTo(N, PassThru, MGT->getChain());		return CombineTo(N, PassThru, MGT->getChain());

if (refineUniformBase(BasePtr, Index, DAG)) {		if (refineUniformBase(BasePtr, Index, DAG)) {
SDValue Ops[] = {Chain, PassThru, Mask, BasePtr, Index, Scale};		SDValue Ops[] = {Chain, PassThru, Mask, BasePtr, Index, Scale};
return DAG.getMaskedGather(DAG.getVTList(N->getValueType(0), MVT::Other),		return DAG.getMaskedGather(DAG.getVTList(N->getValueType(0), MVT::Other),
MGT->getMemoryVT(), DL, Ops,		MGT->getMemoryVT(), DL, Ops,
MGT->getMemOperand(), MGT->getIndexType(),		MGT->getMemOperand(), MGT->getIndexType(),
MGT->getExtensionType());		MGT->getExtensionType());
▲ Show 20 Lines • Show All 13,678 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,146 Lines • ▼ Show 20 Lines	return DAG.getMaskedStore(
getPredicateForFixedLengthVector(DAG, DL, VT), Store->getMemoryVT(),		getPredicateForFixedLengthVector(DAG, DL, VT), Store->getMemoryVT(),
Store->getMemOperand(), Store->getAddressingMode(),		Store->getMemOperand(), Store->getAddressingMode(),
Store->isTruncatingStore());		Store->isTruncatingStore());
}		}

SDValue AArch64TargetLowering::LowerFixedLengthVectorMStoreToSVE(		SDValue AArch64TargetLowering::LowerFixedLengthVectorMStoreToSVE(
SDValue Op, SelectionDAG &DAG) const {		SDValue Op, SelectionDAG &DAG) const {
auto Store = cast<MaskedStoreSDNode>(Op);		auto Store = cast<MaskedStoreSDNode>(Op);
		SDValue Mask = Store->getMask();
if (Store->isTruncatingStore())
return SDValue();

SDLoc DL(Op);		SDLoc DL(Op);
EVT VT = Store->getValue().getValueType();		EVT VT = Store->getValue().getValueType();
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);

		if (Store->isTruncatingStore()) {
		dmgreenUnsubmitted Done Reply Inline Actions Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again? dmgreen: Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1…
		dmgreenUnsubmitted Done Reply Inline Actions @paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine. dmgreen: @paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I think the reason we need to extend here is that fixed-length i1 vectors are not legal types, so the following function needs it to be a non-i1 vector to work correctly. DavidTruby: I think the reason we need to extend here is that fixed-length i1 vectors are not legal types…
		Mask = DAG.getNode(
		ISD::ZERO_EXTEND, DL,
		paulwalker-armUnsubmitted Done Reply Inline Actions For consistency this should be `ISD::SIGN_EXTEND` given on AArch64 fixed length masks are either zero or all ones. paulwalker-arm: For consistency this should be `ISD::SIGN_EXTEND` given on AArch64 fixed length masks are…
		VT.changeVectorElementType(ContainerVT.getVectorElementType()), Mask);
		}
		Mask = convertFixedMaskToScalableVector(Mask, DAG);
auto NewValue = convertToScalableVector(DAG, ContainerVT, Store->getValue());		auto NewValue = convertToScalableVector(DAG, ContainerVT, Store->getValue());
		paulwalker-armUnsubmitted Not Done Reply Inline Actions On reflection I think this might be a bug introduced by the change to `DAGCombiner::visitMSTORE`. I say "might" because the requirements for `MSTORE`'s mask is not well documented. Typically a mask's type is linked to the main datatype of the operation. In this instance the MSTORE's main datatype is the value being stored and thus I believe the mask's VT should always be the result of `getSetCCResultType(Value->getValueType())`. This is something you can see the legaliser honouring, where it uses the helper function `DAGTypeLegalizer::PromoteTargetBoolean`. I think visitMSTORE should be using similar logic when folding in a truncate. paulwalker-arm: On reflection I think this might be a bug introduced by the change to `DAGCombiner…
SDValue Mask = convertFixedMaskToScalableVector(Store->getMask(), DAG);

return DAG.getMaskedStore(		return DAG.getMaskedStore(
Store->getChain(), DL, NewValue, Store->getBasePtr(), Store->getOffset(),		Store->getChain(), DL, NewValue, Store->getBasePtr(), Store->getOffset(),
Mask, Store->getMemoryVT(), Store->getMemOperand(),		Mask, Store->getMemoryVT(), Store->getMemOperand(),
Store->getAddressingMode(), Store->isTruncatingStore());		Store->getAddressingMode(), Store->isTruncatingStore());
}		}

SDValue AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE(		SDValue AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE(
▲ Show 20 Lines • Show All 704 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll

	; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE			; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE
	; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK			; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK
	; RUN: llc -aarch64-sve-vector-bits-min=384 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK			; RUN: llc -aarch64-sve-vector-bits-min=384 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK
	; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=640 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=640 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=768 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=768 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=896 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=896 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1024 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1024 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1152 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1152 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1280 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1280 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1408 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1408 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1536 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1536 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1664 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1664 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1792 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1792 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=1920 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=1920 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
	; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_2048,VBITS_GE_1024,VBITS_GE_512			; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_2048,VBITS_GE_1024,VBITS_GE_512
				RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that this can be cleaned up and you use the update_tests script? The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback. RKSimon: Any chance that this can be cleaned up and you use the update_tests script? The check…

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; Don't use SVE when its registers are no bigger than NEON.			; Don't use SVE when its registers are no bigger than NEON.
	; NO_SVE-NOT: ptrue			; NO_SVE-NOT: ptrue

	;;			;;
	;; Masked Stores			;; Masked Stores
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	}			}

	define void @masked_store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i8>* %dest) #0 {			define void @masked_store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i8>* %dest) #0 {
	; CHECK-LABEL: masked_store_trunc_v8i64i8:			; CHECK-LABEL: masked_store_trunc_v8i64i8:
	; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8			; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
	; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]			; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
	; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]			; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]
	; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d			; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d
	; VBITS_GE_512-DAG: uzp1 [[Z1]].s, [[Z1]].s, [[Z1]].s			; VBITS_GE_512-NEXT: st1b { [[Z0]].d }, p[[P1]], [x{{[0-9]+}}]
	; VBITS_GE_512-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
	; VBITS_GE_512-DAG: uzp1 [[Z1]].b, [[Z1]].b, [[Z1]].b
	; VBITS_GE_512-DAG: cmpne p[[P2:[0-9]+]].b, p{{[0-9]+}}/z, [[Z1]].b, #0
	; VBITS_GE_512-DAG: uzp1 [[Z0]].s, [[Z0]].s, [[Z0]].s
	; VBITS_GE_512-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
	; VBITS_GE_512-DAG: uzp1 [[Z0]].b, [[Z0]].b, [[Z0]].b
	; VBITS_GE_512-NEXT: st1b { [[Z0]].b }, p[[P2]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret
	%a = load <8 x i64>, <8 x i64>* %ap			%a = load <8 x i64>, <8 x i64>* %ap
	%b = load <8 x i64>, <8 x i64>* %bp			%b = load <8 x i64>, <8 x i64>* %bp
	%mask = icmp eq <8 x i64> %a, %b			%mask = icmp eq <8 x i64> %a, %b
	%val = trunc <8 x i64> %a to <8 x i8>			%val = trunc <8 x i64> %a to <8 x i8>
	call void @llvm.masked.store.v8i8(<8 x i8> %val, <8 x i8>* %dest, i32 8, <8 x i1> %mask)			call void @llvm.masked.store.v8i8(<8 x i8> %val, <8 x i8>* %dest, i32 8, <8 x i1> %mask)
	ret void			ret void
	}			}

	define void @masked_store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i16>* %dest) #0 {			define void @masked_store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i16>* %dest) #0 {
	; CHECK-LABEL: masked_store_trunc_v8i64i16:			; CHECK-LABEL: masked_store_trunc_v8i64i16:
	; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8			; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
	; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]			; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
	; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]			; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]
	; VBITS_GE_512-DAG: ptrue p{{[0-9]+}}.h, vl8			; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d
	; VBITS_GE_512-DAG: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d			; VBITS_GE_512-NEXT: st1h { [[Z0]].d }, p[[P1]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: mov [[Z1]].d, p[[P0]]/z, #-1
	; VBITS_GE_512-DAG: uzp1 [[Z1]].s, [[Z1]].s, [[Z1]].s
	; VBITS_GE_512-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
	; VBITS_GE_512-DAG: cmpne p[[P2:[0-9]+]].h, p{{[0-9]+}}/z, [[Z1]].h, #0
	; VBITS_GE_512-DAG: uzp1 [[Z0]].s, [[Z0]].s, [[Z0]].s
	; VBITS_GE_512-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
	; VBITS_GE_512-NEXT: st1h { [[Z0]].h }, p[[P2]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret
	%a = load <8 x i64>, <8 x i64>* %ap			%a = load <8 x i64>, <8 x i64>* %ap
	%b = load <8 x i64>, <8 x i64>* %bp			%b = load <8 x i64>, <8 x i64>* %bp
	%mask = icmp eq <8 x i64> %a, %b			%mask = icmp eq <8 x i64> %a, %b
	%val = trunc <8 x i64> %a to <8 x i16>			%val = trunc <8 x i64> %a to <8 x i16>
	call void @llvm.masked.store.v8i16(<8 x i16> %val, <8 x i16>* %dest, i32 8, <8 x i1> %mask)			call void @llvm.masked.store.v8i16(<8 x i16> %val, <8 x i16>* %dest, i32 8, <8 x i1> %mask)
	ret void			ret void
	}			}

	define void @masked_store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i32>* %dest) #0 {			define void @masked_store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i32>* %dest) #0 {
	; CHECK-LABEL: masked_store_trunc_v8i64i32:			; CHECK-LABEL: masked_store_trunc_v8i64i32:
	; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8			; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
	; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]			; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
	; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]			; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]
	; VBITS_GE_512-DAG: ptrue p{{[0-9]+}}.s, vl8			; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d
	; VBITS_GE_512-DAG: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d			; VBITS_GE_512-NEXT: st1w { [[Z0]].d }, p[[P1]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: mov [[Z1]].d, p[[P0]]/z, #-1
	; VBITS_GE_512-DAG: uzp1 [[Z1]].s, [[Z1]].s, [[Z1]].s
	; VBITS_GE_512-DAG: cmpne p[[P2:[0-9]+]].s, p{{[0-9]+}}/z, [[Z1]].s, #0
	; VBITS_GE_512-DAG: uzp1 [[Z0]].s, [[Z0]].s, [[Z0]].s
	; VBITS_GE_512-NEXT: st1w { [[Z0]].s }, p[[P2]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret
	%a = load <8 x i64>, <8 x i64>* %ap			%a = load <8 x i64>, <8 x i64>* %ap
	%b = load <8 x i64>, <8 x i64>* %bp			%b = load <8 x i64>, <8 x i64>* %bp
	%mask = icmp eq <8 x i64> %a, %b			%mask = icmp eq <8 x i64> %a, %b
	%val = trunc <8 x i64> %a to <8 x i32>			%val = trunc <8 x i64> %a to <8 x i32>
	call void @llvm.masked.store.v8i32(<8 x i32> %val, <8 x i32>* %dest, i32 8, <8 x i1> %mask)			call void @llvm.masked.store.v8i32(<8 x i32> %val, <8 x i32>* %dest, i32 8, <8 x i1> %mask)
	ret void			ret void
	}			}

	define void @masked_store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i8>* %dest) #0 {			define void @masked_store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i8>* %dest) #0 {
	; CHECK-LABEL: masked_store_trunc_v16i32i8:			; CHECK-LABEL: masked_store_trunc_v16i32i8:
	; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16			; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16
	; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]			; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]
	; VBITS_GE_512-NEXT: ld1w { [[Z1:z[0-9]+]].s }, p0/z, [x1]			; VBITS_GE_512-NEXT: ld1w { [[Z1:z[0-9]+]].s }, p0/z, [x1]
	; VBITS_GE_512-DAG: ptrue p{{[0-9]+}}.b, vl16			; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].s, p[[P0]]/z, [[Z0]].s, [[Z1]].s
	; VBITS_GE_512-DAG: cmpeq p[[P1:[0-9]+]].s, p[[P0]]/z, [[Z0]].s, [[Z1]].s			; VBITS_GE_512-NEXT: st1b { [[Z0]].s }, p[[P1]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: mov [[Z1]].s, p[[P0]]/z, #-1
	; VBITS_GE_512-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
	; VBITS_GE_512-DAG: uzp1 [[Z1]].b, [[Z1]].b, [[Z1]].b
	; VBITS_GE_512-DAG: cmpne p[[P2:[0-9]+]].b, p{{[0-9]+}}/z, [[Z1]].b, #0
	; VBITS_GE_512-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
	; VBITS_GE_512-DAG: uzp1 [[Z0]].b, [[Z0]].b, [[Z0]].b
	; VBITS_GE_512-NEXT: st1b { [[Z0]].b }, p[[P2]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret
	%a = load <16 x i32>, <16 x i32>* %ap			%a = load <16 x i32>, <16 x i32>* %ap
	%b = load <16 x i32>, <16 x i32>* %bp			%b = load <16 x i32>, <16 x i32>* %bp
	%mask = icmp eq <16 x i32> %a, %b			%mask = icmp eq <16 x i32> %a, %b
	%val = trunc <16 x i32> %a to <16 x i8>			%val = trunc <16 x i32> %a to <16 x i8>
	call void @llvm.masked.store.v16i8(<16 x i8> %val, <16 x i8>* %dest, i32 8, <16 x i1> %mask)			call void @llvm.masked.store.v16i8(<16 x i8> %val, <16 x i8>* %dest, i32 8, <16 x i1> %mask)
	ret void			ret void
	}			}

	define void @masked_store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i16>* %dest) #0 {			define void @masked_store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i16>* %dest) #0 {
	; CHECK-LABEL: masked_store_trunc_v16i32i16:			; CHECK-LABEL: masked_store_trunc_v16i32i16:
	; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16			; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16
	; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]			; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]
	; VBITS_GE_512-NEXT: ld1w { [[Z1:z[0-9]+]].s }, p0/z, [x1]			; VBITS_GE_512-NEXT: ld1w { [[Z1:z[0-9]+]].s }, p0/z, [x1]
	; VBITS_GE_512-DAG: ptrue p{{[0-9]+}}.h, vl16			; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].s, p[[P0]]/z, [[Z0]].s, [[Z1]].s
	; VBITS_GE_512-DAG: cmpeq p[[P1:[0-9]+]].s, p[[P0]]/z, [[Z0]].s, [[Z1]].s			; VBITS_GE_512-NEXT: st1h { [[Z0]].s }, p[[P1]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: mov [[Z1]].s, p[[P0]]/z, #-1
	; VBITS_GE_512-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
	; VBITS_GE_512-DAG: cmpne p[[P2:[0-9]+]].h, p{{[0-9]+}}/z, [[Z1]].h, #0
	; VBITS_GE_512-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
	; VBITS_GE_512-NEXT: st1h { [[Z0]].h }, p[[P2]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret
	%a = load <16 x i32>, <16 x i32>* %ap			%a = load <16 x i32>, <16 x i32>* %ap
	%b = load <16 x i32>, <16 x i32>* %bp			%b = load <16 x i32>, <16 x i32>* %bp
	%mask = icmp eq <16 x i32> %a, %b			%mask = icmp eq <16 x i32> %a, %b
	%val = trunc <16 x i32> %a to <16 x i16>			%val = trunc <16 x i32> %a to <16 x i16>
	call void @llvm.masked.store.v16i16(<16 x i16> %val, <16 x i16>* %dest, i32 8, <16 x i1> %mask)			call void @llvm.masked.store.v16i16(<16 x i16> %val, <16 x i16>* %dest, i32 8, <16 x i1> %mask)
	ret void			ret void
	}			}

	define void @masked_store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i16>* %bp, <32 x i8>* %dest) #0 {			define void @masked_store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i16>* %bp, <32 x i8>* %dest) #0 {
	; CHECK-LABEL: masked_store_trunc_v32i16i8:			; CHECK-LABEL: masked_store_trunc_v32i16i8:
	; VBITS_GE_512: ptrue p[[P0:[0-9]+]].h, vl32			; VBITS_GE_512: ptrue p[[P0:[0-9]+]].h, vl32
	; VBITS_GE_512-NEXT: ld1h { [[Z0:z[0-9]+]].h }, p0/z, [x0]			; VBITS_GE_512-NEXT: ld1h { [[Z0:z[0-9]+]].h }, p0/z, [x0]
	; VBITS_GE_512-NEXT: ld1h { [[Z1:z[0-9]+]].h }, p0/z, [x1]			; VBITS_GE_512-NEXT: ld1h { [[Z1:z[0-9]+]].h }, p0/z, [x1]
	; VBITS_GE_512-DAG: ptrue p{{[0-9]+}}.b, vl32			; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].h, p[[P0]]/z, [[Z0]].h, [[Z1]].h
	; VBITS_GE_512-DAG: cmpeq p[[P1:[0-9]+]].h, p[[P0]]/z, [[Z0]].h, [[Z1]].h			; VBITS_GE_512-NEXT: st1b { [[Z0]].h }, p[[P1]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: mov [[Z1]].h, p[[P0]]/z, #-1
	; VBITS_GE_512-DAG: uzp1 [[Z1]].b, [[Z1]].b, [[Z1]].b
	; VBITS_GE_512-DAG: cmpne p[[P2:[0-9]+]].b, p{{[0-9]+}}/z, [[Z1]].b, #0
	; VBITS_GE_512-DAG: uzp1 [[Z0]].b, [[Z0]].b, [[Z0]].b
	; VBITS_GE_512-NEXT: st1b { [[Z0]].b }, p[[P2]], [x{{[0-9]+}}]
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret
	%a = load <32 x i16>, <32 x i16>* %ap			%a = load <32 x i16>, <32 x i16>* %ap
	%b = load <32 x i16>, <32 x i16>* %bp			%b = load <32 x i16>, <32 x i16>* %bp
	%mask = icmp eq <32 x i16> %a, %b			%mask = icmp eq <32 x i16> %a, %b
	%val = trunc <32 x i16> %a to <32 x i8>			%val = trunc <32 x i16> %a to <32 x i8>
	call void @llvm.masked.store.v32i8(<32 x i8> %val, <32 x i8>* %dest, i32 8, <32 x i1> %mask)			call void @llvm.masked.store.v32i8(<32 x i8> %val, <32 x i8>* %dest, i32 8, <32 x i1> %mask)
	ret void			ret void
	}			}
	Show All 17 Lines