This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
6/8
DAGCombiner.cpp
-
LegalizeTypes.cpp
-
Target/AArch64/
-
AArch64/
4/5
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
sve-fixed-length-masked-stores.ll

Differential D108115

[DAG][sve] Lowering for VLS masked truncating stores
ClosedPublic

Authored by DavidTruby on Aug 16 2021, 3:39 AM.

Download Raw Diff

Details

Reviewers

dmgreen
bsmith
efriedma
SjoerdMeijer
peterwaller-arm
paulwalker-arm
craig.topper
RKSimon

Commits

rG5c9684704d15: [DAG][sve] Lowering for VLS masked truncating stores

Summary

This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

DavidTruby created this revision.Aug 16 2021, 3:39 AM

Herald added a reviewer: efriedma. · View Herald TranscriptAug 16 2021, 3:39 AM

Herald added subscribers: ctetreau, ecnelises, psnobl and 2 others. · View Herald Transcript

DavidTruby requested review of this revision.Aug 16 2021, 3:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 3:39 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

DavidTruby added a reviewer: SjoerdMeijer.Aug 16 2021, 3:39 AM

DavidTruby added inline comments.

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	These changes aren't correct, I have pinged @dmgreen and @SjoerdMeijer for review to see if they know what's happening here. My suspicion is that there's a target specific combine in Thumb2 for masked truncating stores that my general combine is blocking; I guess the easy fix here is to add an exemption for Thumb2 in the TLI hook but I'm not sure if that's the right thing to do?

Harbormaster completed remote builds in B119675: Diff 366581.Aug 16 2021, 4:01 AM

peterwaller-arm added a reviewer: peterwaller-arm.Aug 17 2021, 5:36 AM

Matt added a subscriber: Matt.Aug 17 2021, 9:24 AM

ping

SjoerdMeijer added inline comments.Sep 7 2021, 6:40 AM

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	Sorry for the delay! Yeah, this extra codegen is not ideal (I think we agree this is a perf regression, not a correctness, but that aside). If we could avoid this that would be best. As I understood, it would involve adding this target hook to the ARM backend: virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT, bool LegalOperations) const override { ... } Would you mind adding this to ARM backend? I have no preference if you include that here (I think that would make sense), or do that separately and create a dependent patch. But I think it would good if we can see the evidence this regression disappears before committing this.

Sorry I didn't see this. I get a lot of phabricator spam and this wasn't very obvious from the title. It should probably have [DAG], [llvm] doesn't say much when the whole project is llvm ;)

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10070	I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store?
llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	We don't want canCombineTruncStore. I think you need to add demanded bits for the truncating masked store. As in https://github.com/llvm/llvm-project/blob/b50a60c234433545fc1c9b39f193373f560ea869/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18115

DavidTruby added inline comments.Sep 7 2021, 7:22 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10070	I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores.

Fix Thumb2 regression

DavidTruby added inline comments.Sep 7 2021, 8:42 AM

llvm/test/CodeGen/Thumb2/mve-satmul-loops.ll
1462 ↗	(On Diff #366581)	Thanks for the pointer, adding this seems to have fixed the regression

Harbormaster completed remote builds in B122883: Diff 371092.Sep 7 2021, 9:52 AM

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10118	Is this tested anywhere?

Remove unused and untested code

In D108115#3001669, @dmgreen wrote:

Thanks for fixing the MVE issues. Most of the changes here don't look SVE related, being generic DAG combines. It would probably be best to split them into a separate review to keep logically separable additions in different reviews in case there are issues with them.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

Harbormaster completed remote builds in B124426: Diff 373262.Sep 17 2021, 10:50 AM

I've reverted 734708e04f84b72f1ae7c8b35c002b8bf97dc064 to fix the 2 stage vls bot while this is in review. Please reland both when this is approved.

peterwaller-arm added a reviewer: paulwalker-arm.Oct 4 2021, 6:37 AM

DavidTruby retitled this revision from [llvm][sve] Lowering for VLS masked truncating stores to [DAG][sve] Lowering for VLS masked truncating stores.Oct 7 2021, 3:17 AM

Sorry for the delay. I missed this as something I should be looking at.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I'm nervous about splitting the DAG combine into a separate patch because it's currently only triggered by SVE code generation; adding it in a separate patch leaves it unused and untested in that patch which I'd rather try and avoid.

OK Fair enough. MVE at least takes a separate route to turning masked stores to truncating stores, but has less legal types to deal with. We can treat the MVE test that improved between revisions as the tests.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10076	This formatting looks off.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18906	Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again?

In D108115#3048051, @dmgreen wrote:

Sorry for the delay. I missed this as something I should be looking at.

No problem, thanks for the review.

I've had a look around, and I can't see anything obvious that goes wrong with this. canCombineTruncStore is a bit overloaded between stores and masked stores - it relies on the support being symetrical.

I see what you mean about this one but I don't see an obvious way around it other than replicating all the getTruncStoreAction etc logic for masked stores separately. This might be the right thing to do in the end though? I'm open to other suggestions as doing that might be a bit of a pain!

RKSimon added inline comments.Oct 12 2021, 9:33 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll
8–16	Any chance that this can be cleaned up and you use the update_tests script? The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback.

I've opened a separate patch to introduce the functionality for masked truncating store actions, which I will rebase this patch on when that is accepted. You can find that patch here https://reviews.llvm.org/D112536

dmgreen mentioned this in D112536: [DAG] Add functionality for masked truncating store actions.Oct 27 2021, 12:55 AM

dmgreen added inline comments.Oct 28 2021, 12:10 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18906	@paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine.

paulwalker-arm added inline comments.Nov 26 2021, 9:50 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10091	This will fail once D114580 lands because that patch breaks the symmetry between truncating stores and masked truncating stores. If we can drop the `ISD::FP_ROUND` part of the combine then great otherwise we either need to add support for floating point to this patch or wait for D112536.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18908	For consistency this should be `ISD::SIGN_EXTEND` given on AArch64 fixed length masks are either zero or all ones.

Don't allow FP_ROUND

Change zero_extend to sign_extend.

DavidTruby marked 2 inline comments as done.Dec 9 2021, 7:02 AM

DavidTruby added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10091	I think for now we should land this without the FP_ROUND condition to get the code improvement in for LLVM 14 and then fix it up correctly in D112536 (which unfortunately needs a fair bit more work)

Harbormaster completed remote builds in B138440: Diff 393152.Dec 9 2021, 8:07 AM

RKSimon resigned from this revision.Dec 9 2021, 12:29 PM

DavidTruby marked 3 inline comments as done.Dec 13 2021, 4:13 AM

DavidTruby added inline comments.Dec 13 2021, 4:30 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18906	I think the reason we need to extend here is that fixed-length i1 vectors are not legal types, so the following function needs it to be a non-i1 vector to work correctly.

DavidTruby marked 2 inline comments as done.Dec 13 2021, 5:32 AM

peterwaller-arm accepted this revision.Dec 13 2021, 7:01 AM

This revision is now accepted and ready to land.Dec 13 2021, 7:01 AM

Sorry for the late review @DavidTruby. Although I believe the patch as is likely works fine I've spotted an inconsistency that if possible would good to fix.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10097	I have a comment below where I'm suggesting passing the original mask unmodified is likely an error.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18905–18906	On reflection I think this might be a bug introduced by the change to `DAGCombiner::visitMSTORE`. I say "might" because the requirements for `MSTORE`'s mask is not well documented. Typically a mask's type is linked to the main datatype of the operation. In this instance the MSTORE's main datatype is the value being stored and thus I believe the mask's VT should always be the result of `getSetCCResultType(Value->getValueType())`. This is something you can see the legaliser honouring, where it uses the helper function `DAGTypeLegalizer::PromoteTargetBoolean`. I think visitMSTORE should be using similar logic when folding in a truncate.

Fix mask type in DAGCombiner::visitMSTORE

paulwalker-arm added inline comments.Dec 15 2021, 4:42 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
10097	You'll need to follow the same idiom as `DAGTypeLegalizer::PromoteTargetBoolean` because although `ISD::SIGN_EXTEND` is correct for AArch64, other target's might need something else.

Harbormaster completed remote builds in B139408: Diff 394520.Dec 15 2021, 4:46 AM

Call promoteTargetBoolean instead of using SIGN_EXTEND for mask.

Harbormaster completed remote builds in B139416: Diff 394534.Dec 15 2021, 6:17 AM

paulwalker-arm accepted this revision.Dec 17 2021, 5:39 AM

This revision was landed with ongoing or failed builds.Dec 17 2021, 7:05 AM

Closed by commit rG5c9684704d15: [DAG][sve] Lowering for VLS masked truncating stores (authored by DavidTruby). · Explain Why

This revision was automatically updated to reflect the committed changes.

DavidTruby added a commit: rG5c9684704d15: [DAG][sve] Lowering for VLS masked truncating stores.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

14 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

38 lines

LegalizeTypes.cpp

6 lines

Target/

AArch64/

AArch64ISelLowering.cpp

5 lines

test/

CodeGen/

AArch64/

sve-fixed-length-masked-stores.ll

126 lines

Diff 395122

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 846 Lines • ▼ Show 20 Lines	if (isVec)
return BooleanVectorContents;		return BooleanVectorContents;
return isFloat ? BooleanFloatContents : BooleanContents;		return isFloat ? BooleanFloatContents : BooleanContents;
}		}

BooleanContent getBooleanContents(EVT Type) const {		BooleanContent getBooleanContents(EVT Type) const {
return getBooleanContents(Type.isVector(), Type.isFloatingPoint());		return getBooleanContents(Type.isVector(), Type.isFloatingPoint());
}		}

		/// Promote the given target boolean to a target boolean of the given type.
		/// A target boolean is an integer value, not necessarily of type i1, the bits
		/// of which conform to getBooleanContents.
		///
		/// ValVT is the type of values that produced the boolean.
		SDValue promoteTargetBoolean(SelectionDAG &DAG, SDValue Bool,
		EVT ValVT) const {
		SDLoc dl(Bool);
		EVT BoolVT =
		getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), ValVT);
		ISD::NodeType ExtendCode = getExtendForContent(getBooleanContents(ValVT));
		return DAG.getNode(ExtendCode, dl, BoolVT, Bool);
		}

/// Return target scheduling preference.		/// Return target scheduling preference.
Sched::Preference getSchedulingPreference() const {		Sched::Preference getSchedulingPreference() const {
return SchedPreferenceInfo;		return SchedPreferenceInfo;
}		}

/// Some scheduler, e.g. hybrid, can switch to different scheduling heuristics		/// Some scheduler, e.g. hybrid, can switch to different scheduling heuristics
/// for different nodes. This function returns the preference (or none) for		/// for different nodes. This function returns the preference (or none) for
/// the given node.		/// the given node.
▲ Show 20 Lines • Show All 3,906 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,038 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitMSCATTER(SDNode *N) {

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitMSTORE(SDNode *N) {		SDValue DAGCombiner::visitMSTORE(SDNode *N) {
MaskedStoreSDNode *MST = cast<MaskedStoreSDNode>(N);		MaskedStoreSDNode *MST = cast<MaskedStoreSDNode>(N);
SDValue Mask = MST->getMask();		SDValue Mask = MST->getMask();
SDValue Chain = MST->getChain();		SDValue Chain = MST->getChain();
		SDValue Value = MST->getValue();
		SDValue Ptr = MST->getBasePtr();
SDLoc DL(N);		SDLoc DL(N);

// Zap masked stores with a zero mask.		// Zap masked stores with a zero mask.
if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))		if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))
return Chain;		return Chain;

// If this is a masked load with an all ones mask, we can use a unmasked load.		// If this is a masked load with an all ones mask, we can use a unmasked load.
// FIXME: Can we do this for indexed, compressing, or truncating stores?		// FIXME: Can we do this for indexed, compressing, or truncating stores?
if (ISD::isConstantSplatVectorAllOnes(Mask.getNode()) && MST->isUnindexed() &&		if (ISD::isConstantSplatVectorAllOnes(Mask.getNode()) && MST->isUnindexed() &&
!MST->isCompressingStore() && !MST->isTruncatingStore())		!MST->isCompressingStore() && !MST->isTruncatingStore())
return DAG.getStore(MST->getChain(), SDLoc(N), MST->getValue(),		return DAG.getStore(MST->getChain(), SDLoc(N), MST->getValue(),
MST->getBasePtr(), MST->getPointerInfo(),		MST->getBasePtr(), MST->getPointerInfo(),
MST->getOriginalAlign(), MachineMemOperand::MOStore,		MST->getOriginalAlign(), MachineMemOperand::MOStore,
MST->getAAInfo());		MST->getAAInfo());

// Try transforming N to an indexed store.		// Try transforming N to an indexed store.
if (CombineToPreIndexedLoadStore(N) \|\| CombineToPostIndexedLoadStore(N))		if (CombineToPreIndexedLoadStore(N) \|\| CombineToPostIndexedLoadStore(N))
return SDValue(N, 0);		return SDValue(N, 0);

		if (MST->isTruncatingStore() && MST->isUnindexed() &&
		Value.getValueType().isInteger() &&
		(!isa<ConstantSDNode>(Value) \|\|
		dmgreenUnsubmitted Done Reply Inline Actions I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non-masked trunc store? dmgreen: I'm surprised to see FP_ROUND here. I guess it gets handled in the same way as a normal non…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I just took this from the existing non-masked truncating store code. I think it gets handled by canCombineTruncStore checking if the truncating store has been marked legal or custom, which it won't have been if the architecture doesn't support floating point truncating stores. DavidTruby: I just took this from the existing non-masked truncating store code. I think it gets handled by…
		!cast<ConstantSDNode>(Value)->isOpaque())) {
		APInt TruncDemandedBits =
		APInt::getLowBitsSet(Value.getScalarValueSizeInBits(),
		MST->getMemoryVT().getScalarSizeInBits());

		// See if we can simplify the operation with
		dmgreenUnsubmitted Done Reply Inline Actions This formatting looks off. dmgreen: This formatting looks off.
		// SimplifyDemandedBits, which only works if the value has a single use.
		if (SimplifyDemandedBits(Value, TruncDemandedBits)) {
		// Re-visit the store if anything changed and the store hasn't been merged
		// with another node (N is deleted) SimplifyDemandedBits will add Value's
		// node back to the worklist if necessary, but we also need to re-visit
		// the Store node itself.
		if (N->getOpcode() != ISD::DELETED_NODE)
		AddToWorklist(N);
		return SDValue(N, 0);
		}
		}

		// If this is a TRUNC followed by a masked store, fold this into a masked
		// truncating store. We can do this even if this is already a masked
		// truncstore.
		paulwalker-armUnsubmitted Done Reply Inline Actions This will fail once D114580 lands because that patch breaks the symmetry between truncating stores and masked truncating stores. If we can drop the `ISD::FP_ROUND` part of the combine then great otherwise we either need to add support for floating point to this patch or wait for D112536. paulwalker-arm: This will fail once D114580 lands because that patch breaks the symmetry between truncating…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I think for now we should land this without the FP_ROUND condition to get the code improvement in for LLVM 14 and then fix it up correctly in D112536 (which unfortunately needs a fair bit more work) DavidTruby: I think for now we should land this without the FP_ROUND condition to get the code improvement…
		if ((Value.getOpcode() == ISD::TRUNCATE) && Value.getNode()->hasOneUse() &&
		MST->isUnindexed() &&
		TLI.canCombineTruncStore(Value.getOperand(0).getValueType(),
		MST->getMemoryVT(), LegalOperations)) {
		auto Mask = TLI.promoteTargetBoolean(DAG, MST->getMask(),
		Value.getOperand(0).getValueType());
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I have a comment below where I'm suggesting passing the original mask unmodified is likely an error. paulwalker-arm: I have a comment below where I'm suggesting passing the original mask unmodified is likely an…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions You'll need to follow the same idiom as `DAGTypeLegalizer::PromoteTargetBoolean` because although `ISD::SIGN_EXTEND` is correct for AArch64, other target's might need something else. paulwalker-arm: You'll need to follow the same idiom as `DAGTypeLegalizer::PromoteTargetBoolean` because…
		return DAG.getMaskedStore(Chain, SDLoc(N), Value.getOperand(0), Ptr,
		MST->getOffset(), Mask, MST->getMemoryVT(),
		MST->getMemOperand(), MST->getAddressingMode(),
		/IsTruncating=/true);
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitMGATHER(SDNode *N) {		SDValue DAGCombiner::visitMGATHER(SDNode *N) {
MaskedGatherSDNode *MGT = cast<MaskedGatherSDNode>(N);		MaskedGatherSDNode *MGT = cast<MaskedGatherSDNode>(N);
SDValue Mask = MGT->getMask();		SDValue Mask = MGT->getMask();
SDValue Chain = MGT->getChain();		SDValue Chain = MGT->getChain();
SDValue Index = MGT->getIndex();		SDValue Index = MGT->getIndex();
SDValue Scale = MGT->getScale();		SDValue Scale = MGT->getScale();
SDValue PassThru = MGT->getPassThru();		SDValue PassThru = MGT->getPassThru();
SDValue BasePtr = MGT->getBasePtr();		SDValue BasePtr = MGT->getBasePtr();
SDLoc DL(N);		SDLoc DL(N);

// Zap gathers with a zero mask.		// Zap gathers with a zero mask.
if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))		if (ISD::isConstantSplatVectorAllZeros(Mask.getNode()))
		dmgreenUnsubmitted Done Reply Inline Actions Is this tested anywhere? dmgreen: Is this tested anywhere?
return CombineTo(N, PassThru, MGT->getChain());		return CombineTo(N, PassThru, MGT->getChain());

if (refineUniformBase(BasePtr, Index, DAG)) {		if (refineUniformBase(BasePtr, Index, DAG)) {
SDValue Ops[] = {Chain, PassThru, Mask, BasePtr, Index, Scale};		SDValue Ops[] = {Chain, PassThru, Mask, BasePtr, Index, Scale};
return DAG.getMaskedGather(DAG.getVTList(N->getValueType(0), MVT::Other),		return DAG.getMaskedGather(DAG.getVTList(N->getValueType(0), MVT::Other),
MGT->getMemoryVT(), DL, Ops,		MGT->getMemoryVT(), DL, Ops,
MGT->getMemOperand(), MGT->getIndexType(),		MGT->getMemOperand(), MGT->getIndexType(),
MGT->getExtensionType());		MGT->getExtensionType());
▲ Show 20 Lines • Show All 13,949 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp

	Show First 20 Lines • Show All 1,001 Lines • ▼ Show 20 Lines
	}			}

	/// Promote the given target boolean to a target boolean of the given type.			/// Promote the given target boolean to a target boolean of the given type.
	/// A target boolean is an integer value, not necessarily of type i1, the bits			/// A target boolean is an integer value, not necessarily of type i1, the bits
	/// of which conform to getBooleanContents.			/// of which conform to getBooleanContents.
	///			///
	/// ValVT is the type of values that produced the boolean.			/// ValVT is the type of values that produced the boolean.
	SDValue DAGTypeLegalizer::PromoteTargetBoolean(SDValue Bool, EVT ValVT) {			SDValue DAGTypeLegalizer::PromoteTargetBoolean(SDValue Bool, EVT ValVT) {
	SDLoc dl(Bool);			return TLI.promoteTargetBoolean(DAG, Bool, ValVT);
	EVT BoolVT = getSetCCResultType(ValVT);
	ISD::NodeType ExtendCode =
	TargetLowering::getExtendForContent(TLI.getBooleanContents(ValVT));
	return DAG.getNode(ExtendCode, dl, BoolVT, Bool);
	}			}

	/// Return the lower LoVT bits of Op in Lo and the upper HiVT bits in Hi.			/// Return the lower LoVT bits of Op in Lo and the upper HiVT bits in Hi.
	void DAGTypeLegalizer::SplitInteger(SDValue Op,			void DAGTypeLegalizer::SplitInteger(SDValue Op,
	EVT LoVT, EVT HiVT,			EVT LoVT, EVT HiVT,
	SDValue &Lo, SDValue &Hi) {			SDValue &Lo, SDValue &Hi) {
	SDLoc dl(Op);			SDLoc dl(Op);
	assert(LoVT.getSizeInBits() + HiVT.getSizeInBits() ==			assert(LoVT.getSizeInBits() + HiVT.getSizeInBits() ==
	Show All 35 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,891 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerFixedLengthVectorStoreToSVE(
return DAG.getMaskedStore(Store->getChain(), DL, NewValue,		return DAG.getMaskedStore(Store->getChain(), DL, NewValue,
Store->getBasePtr(), Store->getOffset(), Pg, MemVT,		Store->getBasePtr(), Store->getOffset(), Pg, MemVT,
Store->getMemOperand(), Store->getAddressingMode(),		Store->getMemOperand(), Store->getAddressingMode(),
Store->isTruncatingStore());		Store->isTruncatingStore());
}		}

SDValue AArch64TargetLowering::LowerFixedLengthVectorMStoreToSVE(		SDValue AArch64TargetLowering::LowerFixedLengthVectorMStoreToSVE(
SDValue Op, SelectionDAG &DAG) const {		SDValue Op, SelectionDAG &DAG) const {
auto Store = cast<MaskedStoreSDNode>(Op);		auto *Store = cast<MaskedStoreSDNode>(Op);

if (Store->isTruncatingStore())
return SDValue();

SDLoc DL(Op);		SDLoc DL(Op);
EVT VT = Store->getValue().getValueType();		EVT VT = Store->getValue().getValueType();
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);

auto NewValue = convertToScalableVector(DAG, ContainerVT, Store->getValue());		auto NewValue = convertToScalableVector(DAG, ContainerVT, Store->getValue());
		dmgreenUnsubmitted Done Reply Inline Actions Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1 vector again? dmgreen: Why do we need to extend the mask? So that convertFixedMaskToScalableVector shrinks to a i1…
		dmgreenUnsubmitted Done Reply Inline Actions @paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of this patch is fine. dmgreen: @paulwalker-arm do the aarch64 portions of this patch look OK to you? If so I think the rest of…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I think the reason we need to extend here is that fixed-length i1 vectors are not legal types, so the following function needs it to be a non-i1 vector to work correctly. DavidTruby: I think the reason we need to extend here is that fixed-length i1 vectors are not legal types…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions On reflection I think this might be a bug introduced by the change to `DAGCombiner::visitMSTORE`. I say "might" because the requirements for `MSTORE`'s mask is not well documented. Typically a mask's type is linked to the main datatype of the operation. In this instance the MSTORE's main datatype is the value being stored and thus I believe the mask's VT should always be the result of `getSetCCResultType(Value->getValueType())`. This is something you can see the legaliser honouring, where it uses the helper function `DAGTypeLegalizer::PromoteTargetBoolean`. I think visitMSTORE should be using similar logic when folding in a truncate. paulwalker-arm: On reflection I think this might be a bug introduced by the change to `DAGCombiner…
SDValue Mask = convertFixedMaskToScalableVector(Store->getMask(), DAG);		SDValue Mask = convertFixedMaskToScalableVector(Store->getMask(), DAG);

		paulwalker-armUnsubmitted Done Reply Inline Actions For consistency this should be `ISD::SIGN_EXTEND` given on AArch64 fixed length masks are either zero or all ones. paulwalker-arm: For consistency this should be `ISD::SIGN_EXTEND` given on AArch64 fixed length masks are…
return DAG.getMaskedStore(		return DAG.getMaskedStore(
Store->getChain(), DL, NewValue, Store->getBasePtr(), Store->getOffset(),		Store->getChain(), DL, NewValue, Store->getBasePtr(), Store->getOffset(),
Mask, Store->getMemoryVT(), Store->getMemOperand(),		Mask, Store->getMemoryVT(), Store->getMemOperand(),
Store->getAddressingMode(), Store->isTruncatingStore());		Store->getAddressingMode(), Store->isTruncatingStore());
}		}

SDValue AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE(		SDValue AArch64TargetLowering::LowerFixedLengthVectorIntDivideToSVE(
SDValue Op, SelectionDAG &DAG) const {		SDValue Op, SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 746 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll

; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE		; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE
; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK		; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK
; RUN: llc -aarch64-sve-vector-bits-min=384 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK		; RUN: llc -aarch64-sve-vector-bits-min=384 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK
; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512		; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
; RUN: llc -aarch64-sve-vector-bits-min=640 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512		; RUN: llc -aarch64-sve-vector-bits-min=640 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
; RUN: llc -aarch64-sve-vector-bits-min=768 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512		; RUN: llc -aarch64-sve-vector-bits-min=768 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
; RUN: llc -aarch64-sve-vector-bits-min=896 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512		; RUN: llc -aarch64-sve-vector-bits-min=896 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
; RUN: llc -aarch64-sve-vector-bits-min=1024 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1024 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1152 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1152 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1280 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1280 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1408 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1408 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1536 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1536 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1664 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1664 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1792 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1792 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=1920 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024		; RUN: llc -aarch64-sve-vector-bits-min=1920 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024
; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048		; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048
		RKSimonUnsubmitted Not Done Reply Inline Actions Any chance that this can be cleaned up and you use the update_tests script? The check coverage for these tests looks very inconsistent - some tests only check VBITS_GE_1024/2048 and the VBITS_GE_512 is reused by 1024/2048 without a 512-bit fallback. RKSimon: Any chance that this can be cleaned up and you use the update_tests script? The check…

target triple = "aarch64-unknown-linux-gnu"		target triple = "aarch64-unknown-linux-gnu"

; Don't use SVE when its registers are no bigger than NEON.		; Don't use SVE when its registers are no bigger than NEON.
; NO_SVE-NOT: ptrue		; NO_SVE-NOT: ptrue

;;		;;
;; Masked Stores		;; Masked Stores
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	; VBITS_GE_2048-NEXT: ret
%mask = fcmp oeq <64 x float> %a, %b		%mask = fcmp oeq <64 x float> %a, %b
call void @llvm.masked.store.v64f32(<64 x float> %a, <64 x float>* %ap, i32 8, <64 x i1> %mask)		call void @llvm.masked.store.v64f32(<64 x float> %a, <64 x float>* %ap, i32 8, <64 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i8>* %dest) #0 {		define void @masked_store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i8>* %dest) #0 {
; VBITS_GE_512-LABEL: masked_store_trunc_v8i64i8:		; VBITS_GE_512-LABEL: masked_store_trunc_v8i64i8:
; VBITS_GE_512: // %bb.0:		; VBITS_GE_512: // %bb.0:
; VBITS_GE_512-NEXT: ptrue p0.d, vl8		; VBITS_GE_512-NEXT: ptrue p[[P0:[0-9]+]].d, vl8
; VBITS_GE_512-NEXT: ld1d { z0.d }, p0/z, [x0]		; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
; VBITS_GE_512-NEXT: ld1d { z1.d }, p0/z, [x1]		; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]
; VBITS_GE_512-NEXT: cmpeq p0.d, p0/z, z0.d, z1.d		; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d
; VBITS_GE_512-NEXT: uzp1 z0.s, z0.s, z0.s		; VBITS_GE_512-NEXT: st1b { [[Z0]].d }, p[[P1]], [x{{[0-9]+}}]
; VBITS_GE_512-NEXT: mov z1.d, p0/z, #-1 // =0xffffffffffffffff
; VBITS_GE_512-NEXT: ptrue p0.b, vl8
; VBITS_GE_512-NEXT: uzp1 z1.s, z1.s, z1.s
; VBITS_GE_512-NEXT: uzp1 z0.h, z0.h, z0.h
; VBITS_GE_512-NEXT: uzp1 z1.h, z1.h, z1.h
; VBITS_GE_512-NEXT: uzp1 z0.b, z0.b, z0.b
; VBITS_GE_512-NEXT: uzp1 z1.b, z1.b, z1.b
; VBITS_GE_512-NEXT: cmpne p0.b, p0/z, z1.b, #0
; VBITS_GE_512-NEXT: st1b { z0.b }, p0, [x2]
; VBITS_GE_512-NEXT: ret		; VBITS_GE_512-NEXT: ret

%a = load <8 x i64>, <8 x i64>* %ap		%a = load <8 x i64>, <8 x i64>* %ap
%b = load <8 x i64>, <8 x i64>* %bp		%b = load <8 x i64>, <8 x i64>* %bp
%mask = icmp eq <8 x i64> %a, %b		%mask = icmp eq <8 x i64> %a, %b
%val = trunc <8 x i64> %a to <8 x i8>		%val = trunc <8 x i64> %a to <8 x i8>
call void @llvm.masked.store.v8i8(<8 x i8> %val, <8 x i8>* %dest, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8i8(<8 x i8> %val, <8 x i8>* %dest, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i16>* %dest) #0 {		define void @masked_store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i16>* %dest) #0 {
; VBITS_GE_512-LABEL: masked_store_trunc_v8i64i16:		; CHECK-LABEL: masked_store_trunc_v8i64i16:
; VBITS_GE_512: // %bb.0:		; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
; VBITS_GE_512-NEXT: ptrue p0.d, vl8		; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
; VBITS_GE_512-NEXT: ld1d { z0.d }, p0/z, [x0]		; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]
; VBITS_GE_512-NEXT: ld1d { z1.d }, p0/z, [x1]		; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d
; VBITS_GE_512-NEXT: cmpeq p0.d, p0/z, z0.d, z1.d		; VBITS_GE_512-NEXT: st1h { [[Z0]].d }, p[[P1]], [x{{[0-9]+}}]
; VBITS_GE_512-NEXT: uzp1 z0.s, z0.s, z0.s
; VBITS_GE_512-NEXT: mov z1.d, p0/z, #-1 // =0xffffffffffffffff
; VBITS_GE_512-NEXT: ptrue p0.h, vl8
; VBITS_GE_512-NEXT: uzp1 z1.s, z1.s, z1.s
; VBITS_GE_512-NEXT: uzp1 z0.h, z0.h, z0.h
; VBITS_GE_512-NEXT: uzp1 z1.h, z1.h, z1.h
; VBITS_GE_512-NEXT: cmpne p0.h, p0/z, z1.h, #0
; VBITS_GE_512-NEXT: st1h { z0.h }, p0, [x2]
; VBITS_GE_512-NEXT: ret		; VBITS_GE_512-NEXT: ret
%a = load <8 x i64>, <8 x i64>* %ap		%a = load <8 x i64>, <8 x i64>* %ap
%b = load <8 x i64>, <8 x i64>* %bp		%b = load <8 x i64>, <8 x i64>* %bp
%mask = icmp eq <8 x i64> %a, %b		%mask = icmp eq <8 x i64> %a, %b
%val = trunc <8 x i64> %a to <8 x i16>		%val = trunc <8 x i64> %a to <8 x i16>
call void @llvm.masked.store.v8i16(<8 x i16> %val, <8 x i16>* %dest, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8i16(<8 x i16> %val, <8 x i16>* %dest, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i32>* %dest) #0 {		define void @masked_store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i64>* %bp, <8 x i32>* %dest) #0 {
; VBITS_GE_512-LABEL: masked_store_trunc_v8i64i32:		; CHECK-LABEL: masked_store_trunc_v8i64i32:
; VBITS_GE_512: // %bb.0:		; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
; VBITS_GE_512-NEXT: ptrue p0.d, vl8		; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
; VBITS_GE_512-NEXT: ld1d { z0.d }, p0/z, [x0]		; VBITS_GE_512-NEXT: ld1d { [[Z1:z[0-9]+]].d }, p0/z, [x1]
; VBITS_GE_512-NEXT: ld1d { z1.d }, p0/z, [x1]		; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].d, p[[P0]]/z, [[Z0]].d, [[Z1]].d
; VBITS_GE_512-NEXT: cmpeq p0.d, p0/z, z0.d, z1.d		; VBITS_GE_512-NEXT: st1w { [[Z0]].d }, p[[P1]], [x{{[0-9]+}}]
; VBITS_GE_512-NEXT: uzp1 z0.s, z0.s, z0.s
; VBITS_GE_512-NEXT: mov z1.d, p0/z, #-1 // =0xffffffffffffffff
; VBITS_GE_512-NEXT: ptrue p0.s, vl8
; VBITS_GE_512-NEXT: uzp1 z1.s, z1.s, z1.s
; VBITS_GE_512-NEXT: cmpne p0.s, p0/z, z1.s, #0
; VBITS_GE_512-NEXT: st1w { z0.s }, p0, [x2]
; VBITS_GE_512-NEXT: ret		; VBITS_GE_512-NEXT: ret
%a = load <8 x i64>, <8 x i64>* %ap		%a = load <8 x i64>, <8 x i64>* %ap
%b = load <8 x i64>, <8 x i64>* %bp		%b = load <8 x i64>, <8 x i64>* %bp
%mask = icmp eq <8 x i64> %a, %b		%mask = icmp eq <8 x i64> %a, %b
%val = trunc <8 x i64> %a to <8 x i32>		%val = trunc <8 x i64> %a to <8 x i32>
call void @llvm.masked.store.v8i32(<8 x i32> %val, <8 x i32>* %dest, i32 8, <8 x i1> %mask)		call void @llvm.masked.store.v8i32(<8 x i32> %val, <8 x i32>* %dest, i32 8, <8 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i8>* %dest) #0 {		define void @masked_store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i8>* %dest) #0 {
; VBITS_GE_512-LABEL: masked_store_trunc_v16i32i8:		; CHECK-LABEL: masked_store_trunc_v16i32i8:
; VBITS_GE_512: // %bb.0:		; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16
; VBITS_GE_512-NEXT: ptrue p0.s, vl16		; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]
; VBITS_GE_512-NEXT: ld1w { z0.s }, p0/z, [x0]		; VBITS_GE_512-NEXT: ld1w { [[Z1:z[0-9]+]].s }, p0/z, [x1]
; VBITS_GE_512-NEXT: ld1w { z1.s }, p0/z, [x1]		; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].s, p[[P0]]/z, [[Z0]].s, [[Z1]].s
; VBITS_GE_512-NEXT: cmpeq p0.s, p0/z, z0.s, z1.s		; VBITS_GE_512-NEXT: st1b { [[Z0]].s }, p[[P1]], [x{{[0-9]+}}]
; VBITS_GE_512-NEXT: uzp1 z0.h, z0.h, z0.h
; VBITS_GE_512-NEXT: mov z1.s, p0/z, #-1 // =0xffffffffffffffff
; VBITS_GE_512-NEXT: ptrue p0.b, vl16
; VBITS_GE_512-NEXT: uzp1 z1.h, z1.h, z1.h
; VBITS_GE_512-NEXT: uzp1 z0.b, z0.b, z0.b
; VBITS_GE_512-NEXT: uzp1 z1.b, z1.b, z1.b
; VBITS_GE_512-NEXT: cmpne p0.b, p0/z, z1.b, #0
; VBITS_GE_512-NEXT: st1b { z0.b }, p0, [x2]
; VBITS_GE_512-NEXT: ret		; VBITS_GE_512-NEXT: ret
%a = load <16 x i32>, <16 x i32>* %ap		%a = load <16 x i32>, <16 x i32>* %ap
%b = load <16 x i32>, <16 x i32>* %bp		%b = load <16 x i32>, <16 x i32>* %bp
%mask = icmp eq <16 x i32> %a, %b		%mask = icmp eq <16 x i32> %a, %b
%val = trunc <16 x i32> %a to <16 x i8>		%val = trunc <16 x i32> %a to <16 x i8>
call void @llvm.masked.store.v16i8(<16 x i8> %val, <16 x i8>* %dest, i32 8, <16 x i1> %mask)		call void @llvm.masked.store.v16i8(<16 x i8> %val, <16 x i8>* %dest, i32 8, <16 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i16>* %dest) #0 {		define void @masked_store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i32>* %bp, <16 x i16>* %dest) #0 {
; VBITS_GE_512-LABEL: masked_store_trunc_v16i32i16:		; CHECK-LABEL: masked_store_trunc_v16i32i16:
; VBITS_GE_512: // %bb.0:		; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16
; VBITS_GE_512-NEXT: ptrue p0.s, vl16		; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]
; VBITS_GE_512-NEXT: ld1w { z0.s }, p0/z, [x0]		; VBITS_GE_512-NEXT: ld1w { [[Z1:z[0-9]+]].s }, p0/z, [x1]
; VBITS_GE_512-NEXT: ld1w { z1.s }, p0/z, [x1]		; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].s, p[[P0]]/z, [[Z0]].s, [[Z1]].s
; VBITS_GE_512-NEXT: cmpeq p0.s, p0/z, z0.s, z1.s		; VBITS_GE_512-NEXT: st1h { [[Z0]].s }, p[[P1]], [x{{[0-9]+}}]
; VBITS_GE_512-NEXT: uzp1 z0.h, z0.h, z0.h
; VBITS_GE_512-NEXT: mov z1.s, p0/z, #-1 // =0xffffffffffffffff
; VBITS_GE_512-NEXT: ptrue p0.h, vl16
; VBITS_GE_512-NEXT: uzp1 z1.h, z1.h, z1.h
; VBITS_GE_512-NEXT: cmpne p0.h, p0/z, z1.h, #0
; VBITS_GE_512-NEXT: st1h { z0.h }, p0, [x2]
; VBITS_GE_512-NEXT: ret		; VBITS_GE_512-NEXT: ret
%a = load <16 x i32>, <16 x i32>* %ap		%a = load <16 x i32>, <16 x i32>* %ap
%b = load <16 x i32>, <16 x i32>* %bp		%b = load <16 x i32>, <16 x i32>* %bp
%mask = icmp eq <16 x i32> %a, %b		%mask = icmp eq <16 x i32> %a, %b
%val = trunc <16 x i32> %a to <16 x i16>		%val = trunc <16 x i32> %a to <16 x i16>
call void @llvm.masked.store.v16i16(<16 x i16> %val, <16 x i16>* %dest, i32 8, <16 x i1> %mask)		call void @llvm.masked.store.v16i16(<16 x i16> %val, <16 x i16>* %dest, i32 8, <16 x i1> %mask)
ret void		ret void
}		}

define void @masked_store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i16>* %bp, <32 x i8>* %dest) #0 {		define void @masked_store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i16>* %bp, <32 x i8>* %dest) #0 {
; VBITS_GE_512-LABEL: masked_store_trunc_v32i16i8:		; CHECK-LABEL: masked_store_trunc_v32i16i8:
; VBITS_GE_512: // %bb.0:		; VBITS_GE_512: ptrue p[[P0:[0-9]+]].h, vl32
; VBITS_GE_512-NEXT: ptrue p0.h, vl32		; VBITS_GE_512-NEXT: ld1h { [[Z0:z[0-9]+]].h }, p0/z, [x0]
; VBITS_GE_512-NEXT: ld1h { z0.h }, p0/z, [x0]		; VBITS_GE_512-NEXT: ld1h { [[Z1:z[0-9]+]].h }, p0/z, [x1]
; VBITS_GE_512-NEXT: ld1h { z1.h }, p0/z, [x1]		; VBITS_GE_512-NEXT: cmpeq p[[P1:[0-9]+]].h, p[[P0]]/z, [[Z0]].h, [[Z1]].h
; VBITS_GE_512-NEXT: cmpeq p0.h, p0/z, z0.h, z1.h		; VBITS_GE_512-NEXT: st1b { [[Z0]].h }, p[[P1]], [x{{[0-9]+}}]
; VBITS_GE_512-NEXT: uzp1 z0.b, z0.b, z0.b
; VBITS_GE_512-NEXT: mov z1.h, p0/z, #-1 // =0xffffffffffffffff
; VBITS_GE_512-NEXT: ptrue p0.b, vl32
; VBITS_GE_512-NEXT: uzp1 z1.b, z1.b, z1.b
; VBITS_GE_512-NEXT: cmpne p0.b, p0/z, z1.b, #0
; VBITS_GE_512-NEXT: st1b { z0.b }, p0, [x2]
; VBITS_GE_512-NEXT: ret		; VBITS_GE_512-NEXT: ret
%a = load <32 x i16>, <32 x i16>* %ap		%a = load <32 x i16>, <32 x i16>* %ap
%b = load <32 x i16>, <32 x i16>* %bp		%b = load <32 x i16>, <32 x i16>* %bp
%mask = icmp eq <32 x i16> %a, %b		%mask = icmp eq <32 x i16> %a, %b
%val = trunc <32 x i16> %a to <32 x i8>		%val = trunc <32 x i16> %a to <32 x i8>
call void @llvm.masked.store.v32i8(<32 x i8> %val, <32 x i8>* %dest, i32 8, <32 x i1> %mask)		call void @llvm.masked.store.v32i8(<32 x i8> %val, <32 x i8>* %dest, i32 8, <32 x i1> %mask)
ret void		ret void
}		}

Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAG][sve] Lowering for VLS masked truncating storesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 395122

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-stores.ll

[DAG][sve] Lowering for VLS masked truncating stores
ClosedPublic