This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/1
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/1
DAGCombiner.cpp
-
Target/
-
AArch64/
5/8
AArch64ISelLowering.cpp
-
AMDGPU/
-
R600ISelLowering.h
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1/1
sve-fixed-length-masked-gather.ll
1/3
sve-fixed-length-trunc-stores.ll
-
Mips/
-
cconv/
-
byval.ll
-
vector.ll
-
llvm-ir/
-
store.ll

Differential D104471

[llvm][sve] Lowering for VLS truncating stores
ClosedPublic

Authored by DavidTruby on Jun 17 2021, 9:58 AM.

Download Raw Diff

Details

Reviewers

efriedma
paulwalker-arm
peterwaller-arm
bsmith
joechrisellis
arsenm

Commits

rG1528a4d40022: [llvm][sve] Lowering for VLS truncating stores
rGc305557acdaa: [llvm][sve] Lowering for VLS truncating stores

Summary

This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

It also includes an AArch64-specific DAG combine to prevent a
regression where truncates would be merged in to stores before
being cancelled out with a prior extend; this DAG combine
merges the extends and truncating stores to obtain a normal store.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

DavidTruby created this revision.Jun 17 2021, 9:58 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJun 17 2021, 9:58 AM

Herald added subscribers: ecnelises, psnobl, hiraditya, tschuett. · View Herald Transcript

DavidTruby requested review of this revision.Jun 17 2021, 9:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2021, 9:58 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

DavidTruby added reviewers: paulwalker-arm, peterwaller-arm, bsmith, joechrisellis.Jun 17 2021, 9:59 AM

This patch is making way too many seemingly unrelated code changes without much explanation. Is all of this really necessary for the initial patch?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1245	Can we change this to make sure we only mark stores that we can actually lower "custom"? (i.e. the value type is a legal integer vector, and the memory type has a legal element type.) That should make the rest of the patch simpler, I think.
llvm/test/CodeGen/AArch64/sve-fixed-length-trunc-stores.ll
22	Why don't we want to optimize store_trunc_v2i64i8 when SVE registers are 128 bits wide?
184	Can we "uzp1 z0, z0, z1" or something like that in store_trunc_v16i32i16?

Harbormaster completed remote builds in B109750: Diff 352774.Jun 17 2021, 9:07 PM

Matt added a subscriber: Matt.Jun 18 2021, 11:37 AM

In D104471#2825191, @efriedma wrote:

This patch is making way too many seemingly unrelated code changes without much explanation. Is all of this really necessary for the initial patch?

Hi, thanks for the review! I think maybe I didn't explain that well in the summary why the extra changes are needed. I've hopefully added better explanations below.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1245	I'm not sure what doing this would allow us to remove for the rest of the patch, could you point me at it?
15166	This unrelated-appearing addition is necessary to prevent a regression in some code generation we already had. In essence, sometimes we were generating an instruction, extending the result, truncating that and storing; we were relying on a later fold of an extend and truncate but since the fold into the truncating store happens first we were left with an extra extend then a truncating store. This function removes that extra extend that we were generating in those cases. I realise it's somewhat unrelated appearing to the rest of the patch but I didn't want to introduce a regression and change all the tests and then fix the regression in a separate patch and change the tests back.
llvm/test/CodeGen/AArch64/sve-fixed-length-trunc-stores.ll
184	The code generation of these legalized types is poor and needs fixing in a separate patch, as these should really be using truncating stores as well. These tests just check that legalised code is correct rather than good at the moment.

Looking again with your explanation, I understand a bit more how the pieces fit together; it just wasn't obvious at first glance.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1245	Well, the change to guard LowerTruncateVectorStore is only necessary because of this, I think. And probably it messes with DAGCombine heuristics a little. But neither of those is a big deal, I guess. I think it's still worth doing to illustrate the intent here, though; it's hard to someone reading this to understand why you're custom-lowering every possible truncstore.
17878	Not sure what this change is doing here; I don't see any test coverage for masked stores?

DavidTruby added inline comments.Jun 28 2021, 5:51 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17878	This is a mistake, I was working on masked truncating stores alongside this before deciding they should be two separate patches due to the size. Thanks for spotting it!

paulwalker-arm added inline comments.Jun 29 2021, 3:58 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1245	Perhaps this should be moved into addTypeForFixedLengthSVE or at least part of it, then here we just need to worry about the 128 and 64bit vectors data vectors, which is consistent with how we handle the other nodes when it comes to SVE's fixed-length lowering support. When doing this we can also speed up the InnerVT loop since we only want to care about a small subset of types (namely starting at i8 element types and doubling until we reach VT's element type. For example: // if VT.isInteger InnerVT = VT.changeVectorElementType(i8); while (InnerVT != VT) setTruncStoreAction(VT, InnerVT, Custom); InnerVT = InnerVT.widenIntegerVectorElementType(); } When is comes to the 128 and 64bit vectors data vectors support we may not even need a loop given we know the exact MVTs we want to custom lower.

DavidTruby edited the summary of this revision. (Show Details)Jun 29 2021, 5:27 AM

Herald added a subscriber: kristof.beyls. · View Herald TranscriptJun 29 2021, 5:27 AM

Fixes based on review comments

I seem to be causing some failures on MIPS and AMDGPU with this patch and I can't really work out why. Don't suppose anyone has any pointers?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1245	I couldn't use `widenIntegerVectorElementType` here because for some reason it requires a Context. I think what I've done should be equivalent though.

Harbormaster completed remote builds in B111525: Diff 355231.Jun 29 2021, 7:51 AM

I seem to be causing some failures on MIPS and AMDGPU with this patch and I can't really work out why. Don't suppose anyone has any pointers?

This is obviously the DAGCombiner changes. Not sure what sort of pointers you're looking for; either the changes are desirable for mips/amdgpu, or they're not and the combine needs to be restricted somehow.

Here I've fixed the test breakages on MIPS and AMDGPU

I believe the new MIPS code generated is both correct and better
than the existing code generation but I'd like someone more
familiar with the architecture than me to verify that!

The AMDGPU change is more involved, as it revolves around the fact
that truncating stores of vectors are not supported on R600 from
what I understand. As such we don't want to tell the DAG combiner
that we want more truncating stores. I'm not really sure what
I've done here is the "correct" way to handle this, it feels a bit
hacky to me to be honest, so if anyone has any better suggestions
I would like to hear them

Herald added subscribers: foad, kerbowa, atanasyan and 5 others. · View Herald TranscriptJul 1 2021, 9:55 AM

Harbormaster completed remote builds in B112020: Diff 355926.Jul 1 2021, 10:44 AM

For MIPS, it looks like we're somehow eliminating zero-extension operations which aren't getting eliminated otherwise. Not sure why we don't manage to perform the transform otherwise, but seems fine. Actually, maybe this is an argument for generating truncstores even more aggressively than your patch; in theory, before operation legalization, we could do the transform even if it's "expand".

For R600, it looks like combining to a truncstore blocks an important target-specific DAGCombine; the combines don't support loadext/truncstore. See AMDGPUTargetLowering::shouldCombineMemoryType. You could try to fix the target-specific combines, or just add a target hook, I guess.

llvm/include/llvm/CodeGen/TargetLowering.h
1268	If you're going to add a target hook for this, please make it a separate function from isTruncStoreLegalOrCustom, so it's clear what it's doing.

Herald added a subscriber: wdng. · View Herald TranscriptJul 1 2021, 12:53 PM

efriedma added inline comments.Jul 1 2021, 12:56 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18097	There's a potential infinite loop here: if a truncstore is "custom", and expands to trunc+store, combining never ends. You can avoid this by checking for LegalOperations.

Renamed R600 target hook and added check for LegalOperations.

DavidTruby marked 2 inline comments as done.Jul 6 2021, 8:59 AM

bsmith added inline comments.Jul 6 2021, 9:03 AM

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
151–152	This test change looks incorrect

efriedma added inline comments.Jul 6 2021, 9:51 AM

llvm/test/CodeGen/Mips/msa/f16-llvm-ir.ll
1012 ↗	(On Diff #356747)	Not sure what this change is about?

Harbormaster completed remote builds in B112636: Diff 356747.Jul 6 2021, 11:17 AM

Fixed changes in tests caused by rebase

DavidTruby marked 2 inline comments as done.Jul 8 2021, 4:34 AM

DavidTruby added inline comments.

llvm/test/CodeGen/Mips/msa/f16-llvm-ir.ll
1012 ↗	(On Diff #356747)	Looks like this snuck in when I rebased, I've removed it as a change.

DavidTruby marked an inline comment as done.Jul 8 2021, 4:36 AM

Harbormaster completed remote builds in B112962: Diff 357192.Jul 8 2021, 6:05 AM

LGTM

This revision is now accepted and ready to land.Jul 8 2021, 11:02 AM

This revision was landed with ongoing or failed builds.Jul 12 2021, 3:14 AM

Closed by commit rGc305557acdaa: [llvm][sve] Lowering for VLS truncating stores (authored by DavidTruby). · Explain Why

This revision was automatically updated to reflect the committed changes.

DavidTruby added a commit: rGc305557acdaa: [llvm][sve] Lowering for VLS truncating stores.

Seems like this change caused a seg fault in chrome android PGO builds.

reproduces with: clang -cc1 -triple aarch64-unknown-linux-android21 -emit-obj -Oz t.cpp
and cpp file is attached

t.cpp521 BDownload

will revert this for now

akhuang added a reverting change: rGfd972bb9fd78: Revert "[llvm][sve] Lowering for VLS truncating stores" because it.Jul 19 2021, 11:03 AM

DavidTruby added a commit: rG1528a4d40022: [llvm][sve] Lowering for VLS truncating stores.Jul 23 2021, 6:05 AM

@akhuang thanks for the reproducer, I've committed this again with a small fix that prevents that issue from occuring. In essence, I was not checking in the dag combine here if the store was an indexed store (at which point we can't perform the optimisation), I've added a check for that now and your example passes. If you see any more issues please revert again and let me know. Thanks!

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

8 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

9 lines

Target/

AArch64/

AArch64ISelLowering.cpp

49 lines

AMDGPU/

R600ISelLowering.h

9 lines

test/

CodeGen/

AArch64/

sve-fixed-length-masked-gather.ll

13 lines

sve-fixed-length-trunc-stores.ll

218 lines

Mips/

cconv/

byval.ll

13 lines

vector.ll

12 lines

llvm-ir/

store.ll

32 lines

Diff 357880

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,259 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
/// Return true if the specified store with truncation is legal on this		/// Return true if the specified store with truncation is legal on this
/// target.		/// target.
bool isTruncStoreLegal(EVT ValVT, EVT MemVT) const {		bool isTruncStoreLegal(EVT ValVT, EVT MemVT) const {
return isTypeLegal(ValVT) && getTruncStoreAction(ValVT, MemVT) == Legal;		return isTypeLegal(ValVT) && getTruncStoreAction(ValVT, MemVT) == Legal;
}		}

/// Return true if the specified store with truncation has solution on this		/// Return true if the specified store with truncation has solution on this
/// target.		/// target.
bool isTruncStoreLegalOrCustom(EVT ValVT, EVT MemVT) const {		bool isTruncStoreLegalOrCustom(EVT ValVT, EVT MemVT) const {
		efriedmaUnsubmitted Done Reply Inline Actions If you're going to add a target hook for this, please make it a separate function from isTruncStoreLegalOrCustom, so it's clear what it's doing. efriedma: If you're going to add a target hook for this, please make it a separate function from…
return isTypeLegal(ValVT) &&		return isTypeLegal(ValVT) &&
(getTruncStoreAction(ValVT, MemVT) == Legal \|\|		(getTruncStoreAction(ValVT, MemVT) == Legal \|\|
getTruncStoreAction(ValVT, MemVT) == Custom);		getTruncStoreAction(ValVT, MemVT) == Custom);
}		}

		virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT,
		bool LegalOnly) const {
		if (LegalOnly)
		return isTruncStoreLegal(ValVT, MemVT);

		return isTruncStoreLegalOrCustom(ValVT, MemVT);
		}

/// Return how the indexed load should be treated: either it is legal, needs		/// Return how the indexed load should be treated: either it is legal, needs
/// to be promoted to a larger size, needs to be expanded to some other code		/// to be promoted to a larger size, needs to be expanded to some other code
/// sequence, or the target has a custom expander for it.		/// sequence, or the target has a custom expander for it.
LegalizeAction getIndexedLoadAction(unsigned IdxMode, MVT VT) const {		LegalizeAction getIndexedLoadAction(unsigned IdxMode, MVT VT) const {
return getIndexedModeAction(IdxMode, VT, IMAB_Load);		return getIndexedModeAction(IdxMode, VT, IMAB_Load);
}		}

/// Return true if the specified indexed load is legal on this target.		/// Return true if the specified indexed load is legal on this target.
▲ Show 20 Lines • Show All 3,390 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,084 Lines • ▼ Show 20 Lines	if (ST->isUnindexed() && ST->isSimple() &&
return SDValue();		return SDValue();
}		}
}		}
}		}
}		}

// If this is an FP_ROUND or TRUNC followed by a store, fold this into a		// If this is an FP_ROUND or TRUNC followed by a store, fold this into a
// truncating store. We can do this even if this is already a truncstore.		// truncating store. We can do this even if this is already a truncstore.
if ((Value.getOpcode() == ISD::FP_ROUND \|\| Value.getOpcode() == ISD::TRUNCATE)		if ((Value.getOpcode() == ISD::FP_ROUND \|\|
&& Value.getNode()->hasOneUse() && ST->isUnindexed() &&		Value.getOpcode() == ISD::TRUNCATE) &&
TLI.isTruncStoreLegal(Value.getOperand(0).getValueType(),		Value.getNode()->hasOneUse() && ST->isUnindexed() &&
ST->getMemoryVT())) {		TLI.canCombineTruncStore(Value.getOperand(0).getValueType(),
		ST->getMemoryVT(), LegalOperations)) {
		efriedmaUnsubmitted Done Reply Inline Actions There's a potential infinite loop here: if a truncstore is "custom", and expands to trunc+store, combining never ends. You can avoid this by checking for LegalOperations. efriedma: There's a potential infinite loop here: if a truncstore is "custom", and expands to trunc+store…
return DAG.getTruncStore(Chain, SDLoc(N), Value.getOperand(0),		return DAG.getTruncStore(Chain, SDLoc(N), Value.getOperand(0),
Ptr, ST->getMemoryVT(), ST->getMemOperand());		Ptr, ST->getMemoryVT(), ST->getMemOperand());
}		}

// Always perform this optimization before types are legal. If the target		// Always perform this optimization before types are legal. If the target
// prefers, also try this after legalization to catch stores that were created		// prefers, also try this after legalization to catch stores that were created
// by intrinsics or other nodes.		// by intrinsics or other nodes.
if (!LegalTypes \|\| (TLI.mergeStoresAfterLegalization(ST->getMemoryVT()))) {		if (!LegalTypes \|\| (TLI.mergeStoresAfterLegalization(ST->getMemoryVT()))) {
▲ Show 20 Lines • Show All 5,224 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,234 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
setTruncStoreAction(VT, InnerVT, Expand);		setTruncStoreAction(VT, InnerVT, Expand);
// SVE does not have floating-point extending loads.		// SVE does not have floating-point extending loads.
setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::SEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::ZEXTLOAD, VT, InnerVT, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, InnerVT, Expand);
}		}
}		}

		// SVE supports truncating stores of 64 and 128-bit vectors
		setTruncStoreAction(MVT::v2i64, MVT::v2i8, Custom);
		setTruncStoreAction(MVT::v2i64, MVT::v2i16, Custom);
		efriedmaUnsubmitted Not Done Reply Inline Actions Can we change this to make sure we only mark stores that we can actually lower "custom"? (i.e. the value type is a legal integer vector, and the memory type has a legal element type.) That should make the rest of the patch simpler, I think. efriedma: Can we change this to make sure we only mark stores that we can actually lower "custom"? (i.e.
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I'm not sure what doing this would allow us to remove for the rest of the patch, could you point me at it? DavidTruby: I'm not sure what doing this would allow us to remove for the rest of the patch, could you…
		efriedmaUnsubmitted Not Done Reply Inline Actions Well, the change to guard LowerTruncateVectorStore is only necessary because of this, I think. And probably it messes with DAGCombine heuristics a little. But neither of those is a big deal, I guess. I think it's still worth doing to illustrate the intent here, though; it's hard to someone reading this to understand why you're custom-lowering every possible truncstore. efriedma: Well, the change to guard LowerTruncateVectorStore is only necessary because of this, I think.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Perhaps this should be moved into addTypeForFixedLengthSVE or at least part of it, then here we just need to worry about the 128 and 64bit vectors data vectors, which is consistent with how we handle the other nodes when it comes to SVE's fixed-length lowering support. When doing this we can also speed up the InnerVT loop since we only want to care about a small subset of types (namely starting at i8 element types and doubling until we reach VT's element type. For example: // if VT.isInteger InnerVT = VT.changeVectorElementType(i8); while (InnerVT != VT) setTruncStoreAction(VT, InnerVT, Custom); InnerVT = InnerVT.widenIntegerVectorElementType(); } When is comes to the 128 and 64bit vectors data vectors support we may not even need a loop given we know the exact MVTs we want to custom lower. paulwalker-arm: Perhaps this should be moved into addTypeForFixedLengthSVE or at least part of it, then here we…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions I couldn't use `widenIntegerVectorElementType` here because for some reason it requires a Context. I think what I've done should be equivalent though. DavidTruby: I couldn't use `widenIntegerVectorElementType` here because for some reason it requires a…
		setTruncStoreAction(MVT::v2i64, MVT::v2i32, Custom);
		setTruncStoreAction(MVT::v2i32, MVT::v2i8, Custom);
		setTruncStoreAction(MVT::v2i32, MVT::v2i16, Custom);

for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,		for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,
MVT::nxv4f32, MVT::nxv2f64}) {		MVT::nxv4f32, MVT::nxv2f64}) {
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
setOperationAction(ISD::MLOAD, VT, Custom);		setOperationAction(ISD::MLOAD, VT, Custom);
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
▲ Show 20 Lines • Show All 230 Lines • ▼ Show 20 Lines	if (VT.isFloatingPoint()) {
setCondCodeAction(ISD::SETULT, VT, Expand);		setCondCodeAction(ISD::SETULT, VT, Expand);
setCondCodeAction(ISD::SETULE, VT, Expand);		setCondCodeAction(ISD::SETULE, VT, Expand);
setCondCodeAction(ISD::SETUGE, VT, Expand);		setCondCodeAction(ISD::SETUGE, VT, Expand);
setCondCodeAction(ISD::SETUGT, VT, Expand);		setCondCodeAction(ISD::SETUGT, VT, Expand);
setCondCodeAction(ISD::SETUEQ, VT, Expand);		setCondCodeAction(ISD::SETUEQ, VT, Expand);
setCondCodeAction(ISD::SETUNE, VT, Expand);		setCondCodeAction(ISD::SETUNE, VT, Expand);
}		}

		// Mark integer truncating stores as having custom lowering
		if (VT.isInteger()) {
		MVT InnerVT = VT.changeVectorElementType(MVT::i8);
		while (InnerVT != VT) {
		setTruncStoreAction(VT, InnerVT, Custom);
		InnerVT = InnerVT.changeVectorElementType(
		MVT::getIntegerVT(2 * InnerVT.getScalarSizeInBits()));
		}
		}

// Lower fixed length vector operations to scalable equivalents.		// Lower fixed length vector operations to scalable equivalents.
setOperationAction(ISD::ABS, VT, Custom);		setOperationAction(ISD::ABS, VT, Custom);
setOperationAction(ISD::ADD, VT, Custom);		setOperationAction(ISD::ADD, VT, Custom);
setOperationAction(ISD::AND, VT, Custom);		setOperationAction(ISD::AND, VT, Custom);
setOperationAction(ISD::ANY_EXTEND, VT, Custom);		setOperationAction(ISD::ANY_EXTEND, VT, Custom);
setOperationAction(ISD::BITCAST, VT, Custom);		setOperationAction(ISD::BITCAST, VT, Custom);
setOperationAction(ISD::BITREVERSE, VT, Custom);		setOperationAction(ISD::BITREVERSE, VT, Custom);
setOperationAction(ISD::BSWAP, VT, Custom);		setOperationAction(ISD::BSWAP, VT, Custom);
▲ Show 20 Lines • Show All 3,050 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerSTORE(SDValue Op,
assert (StoreNode && "Can only custom lower store nodes");		assert (StoreNode && "Can only custom lower store nodes");

SDValue Value = StoreNode->getValue();		SDValue Value = StoreNode->getValue();

EVT VT = Value.getValueType();		EVT VT = Value.getValueType();
EVT MemVT = StoreNode->getMemoryVT();		EVT MemVT = StoreNode->getMemoryVT();

if (VT.isVector()) {		if (VT.isVector()) {
if (useSVEForFixedLengthVectorVT(VT))		if (useSVEForFixedLengthVectorVT(VT, true))
return LowerFixedLengthVectorStoreToSVE(Op, DAG);		return LowerFixedLengthVectorStoreToSVE(Op, DAG);

unsigned AS = StoreNode->getAddressSpace();		unsigned AS = StoreNode->getAddressSpace();
Align Alignment = StoreNode->getAlign();		Align Alignment = StoreNode->getAlign();
if (Alignment < MemVT.getStoreSize() &&		if (Alignment < MemVT.getStoreSize() &&
!allowsMisalignedMemoryAccesses(MemVT, AS, Alignment,		!allowsMisalignedMemoryAccesses(MemVT, AS, Alignment,
StoreNode->getMemOperand()->getFlags(),		StoreNode->getMemOperand()->getFlags(),
nullptr)) {		nullptr)) {
return scalarizeVectorStore(StoreNode, DAG);		return scalarizeVectorStore(StoreNode, DAG);
}		}

if (StoreNode->isTruncatingStore()) {		if (StoreNode->isTruncatingStore() && VT == MVT::v4i16 &&
		MemVT == MVT::v4i8) {
return LowerTruncateVectorStore(Dl, StoreNode, VT, MemVT, DAG);		return LowerTruncateVectorStore(Dl, StoreNode, VT, MemVT, DAG);
}		}
// 256 bit non-temporal stores can be lowered to STNP. Do this as part of		// 256 bit non-temporal stores can be lowered to STNP. Do this as part of
// the custom lowering, as there are no un-paired non-temporal stores and		// the custom lowering, as there are no un-paired non-temporal stores and
// legalization will break up 256 bit inputs.		// legalization will break up 256 bit inputs.
ElementCount EC = MemVT.getVectorElementCount();		ElementCount EC = MemVT.getVectorElementCount();
if (StoreNode->isNonTemporal() && MemVT.getSizeInBits() == 256u &&		if (StoreNode->isNonTemporal() && MemVT.getSizeInBits() == 256u &&
EC.isKnownEven() &&		EC.isKnownEven() &&
▲ Show 20 Lines • Show All 10,564 Lines • ▼ Show 20 Lines	static bool performTBISimplification(SDValue Addr,
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLI.SimplifyDemandedBits(Addr, DemandedMask, Known, TLO)) {		if (TLI.SimplifyDemandedBits(Addr, DemandedMask, Known, TLO)) {
DCI.CommitTargetLoweringOpt(TLO);		DCI.CommitTargetLoweringOpt(TLO);
return true;		return true;
}		}
return false;		return false;
}		}

		static SDValue foldTruncStoreOfExt(SelectionDAG &DAG, SDNode *N) {
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions This unrelated-appearing addition is necessary to prevent a regression in some code generation we already had. In essence, sometimes we were generating an instruction, extending the result, truncating that and storing; we were relying on a later fold of an extend and truncate but since the fold into the truncating store happens first we were left with an extra extend then a truncating store. This function removes that extra extend that we were generating in those cases. I realise it's somewhat unrelated appearing to the rest of the patch but I didn't want to introduce a regression and change all the tests and then fix the regression in a separate patch and change the tests back. DavidTruby: This unrelated-appearing addition is necessary to prevent a regression in some code generation…
		auto OpCode = N->getOpcode();
		assert(OpCode == ISD::STORE \|\|
		OpCode == ISD::MSTORE && "Expected STORE dag node in input!");

		if (auto Store = dyn_cast<StoreSDNode>(N)) {
		if (!Store->isTruncatingStore())
		return SDValue();
		SDValue Ext = Store->getValue();
		auto ExtOpCode = Ext.getOpcode();
		if (ExtOpCode != ISD::ZERO_EXTEND && ExtOpCode != ISD::SIGN_EXTEND &&
		ExtOpCode != ISD::ANY_EXTEND)
		return SDValue();
		SDValue Orig = Ext->getOperand(0);
		if (Store->getMemoryVT() != Orig->getValueType(0))
		return SDValue();
		return DAG.getStore(Store->getChain(), SDLoc(Store), Orig,
		Store->getBasePtr(), Store->getPointerInfo(),
		Store->getAlign());
		}

		return SDValue();
		}

static SDValue performSTORECombine(SDNode *N,		static SDValue performSTORECombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG,		SelectionDAG &DAG,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
if (SDValue Split = splitStores(N, DCI, DAG, Subtarget))		if (SDValue Split = splitStores(N, DCI, DAG, Subtarget))
return Split;		return Split;

if (Subtarget->supportsAddressTopByteIgnored() &&		if (Subtarget->supportsAddressTopByteIgnored() &&
performTBISimplification(N->getOperand(2), DCI, DAG))		performTBISimplification(N->getOperand(2), DCI, DAG))
return SDValue(N, 0);		return SDValue(N, 0);

		if (SDValue Store = foldTruncStoreOfExt(DAG, N))
		return Store;

return SDValue();		return SDValue();
}		}

/// Target-specific DAG combine function for NEON load/store intrinsics		/// Target-specific DAG combine function for NEON load/store intrinsics
/// to merge base address updates.		/// to merge base address updates.
static SDValue performNEONPostLDSTCombine(SDNode *N,		static SDValue performNEONPostLDSTCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
▲ Show 20 Lines • Show All 2,703 Lines • ▼ Show 20 Lines
}		}

SDValue AArch64TargetLowering::LowerFixedLengthVectorMStoreToSVE(		SDValue AArch64TargetLowering::LowerFixedLengthVectorMStoreToSVE(
SDValue Op, SelectionDAG &DAG) const {		SDValue Op, SelectionDAG &DAG) const {
auto Store = cast<MaskedStoreSDNode>(Op);		auto Store = cast<MaskedStoreSDNode>(Op);

if (Store->isTruncatingStore())		if (Store->isTruncatingStore())
return SDValue();		return SDValue();

efriedmaUnsubmitted Done Reply Inline Actions Not sure what this change is doing here; I don't see any test coverage for masked stores? efriedma: Not sure what this change is doing here; I don't see any test coverage for masked stores?
DavidTrubyAuthorUnsubmitted Done Reply Inline Actions This is a mistake, I was working on masked truncating stores alongside this before deciding they should be two separate patches due to the size. Thanks for spotting it! DavidTruby: This is a mistake, I was working on masked truncating stores alongside this before deciding…
SDLoc DL(Op);		SDLoc DL(Op);
EVT VT = Store->getValue().getValueType();		EVT VT = Store->getValue().getValueType();
EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, VT);

auto NewValue = convertToScalableVector(DAG, ContainerVT, Store->getValue());		auto NewValue = convertToScalableVector(DAG, ContainerVT, Store->getValue());
SDValue Mask = convertFixedMaskToScalableVector(Store->getMask(), DAG);		SDValue Mask = convertFixedMaskToScalableVector(Store->getMask(), DAG);

return DAG.getMaskedStore(		return DAG.getMaskedStore(
▲ Show 20 Lines • Show All 710 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/R600ISelLowering.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	public:
bool canMergeStoresTo(unsigned AS, EVT MemVT,		bool canMergeStoresTo(unsigned AS, EVT MemVT,
const SelectionDAG &DAG) const override;		const SelectionDAG &DAG) const override;

bool allowsMisalignedMemoryAccesses(		bool allowsMisalignedMemoryAccesses(
EVT VT, unsigned AS, Align Alignment,		EVT VT, unsigned AS, Align Alignment,
MachineMemOperand::Flags Flags = MachineMemOperand::MONone,		MachineMemOperand::Flags Flags = MachineMemOperand::MONone,
bool *IsFast = nullptr) const override;		bool *IsFast = nullptr) const override;

		virtual bool canCombineTruncStore(EVT ValVT, EVT MemVT,
		bool LegalOperations) const override {
		// R600 has "custom" lowering for truncating stores despite not supporting
		// those instructions. If we allow that custom lowering in the DAG combiner
		// then all truncates are merged into truncating stores, giving worse code
		// generation. This hook prevents the DAG combiner performing that combine.
		return isTruncStoreLegal(ValVT, MemVT);
		}

private:		private:
unsigned Gen;		unsigned Gen;
/// Each OpenCL kernel has nine implicit parameters that are stored in the		/// Each OpenCL kernel has nine implicit parameters that are stored in the
/// first nine dwords of a Vertex Buffer. These implicit parameters are		/// first nine dwords of a Vertex Buffer. These implicit parameters are
/// lowered to load instructions which retrieve the values from the Vertex		/// lowered to load instructions which retrieve the values from the Vertex
/// Buffer.		/// Buffer.
SDValue LowerImplicitParameter(SelectionDAG &DAG, EVT VT, const SDLoc &DL,		SDValue LowerImplicitParameter(SelectionDAG &DAG, EVT VT, const SDLoc &DL,
unsigned DwordOffset) const;		unsigned DwordOffset) const;
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll

	Show All 30 Lines
	; CHECK-NEXT: ldr q[[PTRS:[0-9]+]], [x1]			; CHECK-NEXT: ldr q[[PTRS:[0-9]+]], [x1]
	; CHECK-NEXT: ptrue [[PG0:p[0-9]+]].s, vl2			; CHECK-NEXT: ptrue [[PG0:p[0-9]+]].s, vl2
	; CHECK-NEXT: fmov s[[VALS:[0-9]+]], [[VALS_LO]]			; CHECK-NEXT: fmov s[[VALS:[0-9]+]], [[VALS_LO]]
	; CHECK-NEXT: mov v[[VALS]].s[1], [[VALS_HI]]			; CHECK-NEXT: mov v[[VALS]].s[1], [[VALS_HI]]
	; CHECK-NEXT: cmeq v[[CMP:[0-9]+]].2s, v[[VALS]].2s, #0			; CHECK-NEXT: cmeq v[[CMP:[0-9]+]].2s, v[[VALS]].2s, #0
	; CHECK-NEXT: cmpne [[MASK:p[0-9]+]].s, [[PG0]]/z, z[[CMP]].s, #0			; CHECK-NEXT: cmpne [[MASK:p[0-9]+]].s, [[PG0]]/z, z[[CMP]].s, #0
	; CHECK-NEXT: ld1sb { z[[RES:[0-9]+]].d }, [[MASK]]/z, [z[[PTRS]].d]			; CHECK-NEXT: ld1sb { z[[RES:[0-9]+]].d }, [[MASK]]/z, [z[[PTRS]].d]
	; CHECK-NEXT: xtn v[[XTN:[0-9]+]].2s, v[[RES]].2d			; CHECK-NEXT: xtn v[[XTN:[0-9]+]].2s, v[[RES]].2d
	; CHECK-NEXT: mov [[RES_HI:w[0-9]+]], v[[XTN]].s[1]			; CHECK-NEXT: st1b { z[[XTN]].s }, [[PG0]], [x0]
	; CHECK-NEXT: fmov [[RES_LO:w[0-9]+]], s[[XTN]]
	; CHECK-NEXT: strb [[RES_LO]], [x0]
	; CHECK-NEXT: strb [[RES_HI]], [x0, #1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cval = load <2 x i8>, <2 x i8>* %a			%cval = load <2 x i8>, <2 x i8>* %a
	%ptrs = load <2 x i8>, <2 x i8>* %b			%ptrs = load <2 x i8>, <2 x i8>* %b
	%mask = icmp eq <2 x i8> %cval, zeroinitializer			%mask = icmp eq <2 x i8> %cval, zeroinitializer
	%vals = call <2 x i8> @llvm.masked.gather.v2i8(<2 x i8*> %ptrs, i32 8, <2 x i1> %mask, <2 x i8> undef)			%vals = call <2 x i8> @llvm.masked.gather.v2i8(<2 x i8*> %ptrs, i32 8, <2 x i1> %mask, <2 x i8> undef)
	store <2 x i8> %vals, <2 x i8>* %a			store <2 x i8> %vals, <2 x i8>* %a
	ret void			ret void
	}			}

	define void @masked_gather_v4i8(<4 x i8>* %a, <4 x i8> %b) #0 {			define void @masked_gather_v4i8(<4 x i8>* %a, <4 x i8> %b) #0 {
	; CHECK-LABEL: masked_gather_v4i8:			; CHECK-LABEL: masked_gather_v4i8:
	; CHECK: ldr s[[VALS:[0-9]+]], [x0]			; CHECK: ldr s[[VALS:[0-9]+]], [x0]
	; CHECK-NEXT: ptrue [[PG0:p[0-9]+]].d, vl4			; CHECK-NEXT: ptrue [[PG0:p[0-9]+]].d, vl4
	; CHECK-NEXT: ld1d { [[PTRS:z[0-9]+]].d }, [[PG0]]/z, [x1]			; CHECK-NEXT: ld1d { [[PTRS:z[0-9]+]].d }, [[PG0]]/z, [x1]
	; CHECK-NEXT: ptrue [[PG1:p[0-9]+]].h, vl4			; CHECK-NEXT: ptrue [[PG1:p[0-9]+]].h, vl4
	; CHECK-NEXT: ushll [[SHL:v[0-9]+]].8h, v[[VALS]].8b, #0			; CHECK-NEXT: ushll [[SHL:v[0-9]+]].8h, v[[VALS]].8b, #0
	; CHECK-NEXT: cmeq v[[CMP:[0-9]+]].4h, [[SHL]].4h, #0			; CHECK-NEXT: cmeq v[[CMP:[0-9]+]].4h, [[SHL]].4h, #0
	; CHECK-NEXT: cmpne [[MASK:p[0-9]+]].h, [[PG1]]/z, z[[CMP]].h, #0			; CHECK-NEXT: cmpne [[MASK:p[0-9]+]].h, [[PG1]]/z, z[[CMP]].h, #0
	; CHECK-NEXT: ld1sb { [[RES:z[0-9]+]].d }, [[MASK]]/z, {{\[}}[[PTRS]].d]			; CHECK-NEXT: ld1sb { [[RES:z[0-9]+]].d }, [[MASK]]/z, {{\[}}[[PTRS]].d]
	; CHECK-NEXT: uzp1 [[UZP1:z[0-9]+]].s, [[RES]].s, [[RES]].s			; CHECK-NEXT: uzp1 [[UZP1:z[0-9]+]].s, [[RES]].s, [[RES]].s
	; CHECK-NEXT: uzp1 z[[UZP2:[0-9]+]].h, [[UZP1]].h, [[UZP1]].h			; CHECK-NEXT: uzp1 z[[UZP2:[0-9]+]].h, [[UZP1]].h, [[UZP1]].h
	; CHECK-NEXT: uzp1 v[[UZP3:[0-9]+]].8b, v[[UZP2]].8b, v[[UZP2]].8b			; CHECK-NEXT: st1b { z[[UZP2]].h }, [[PG0]], [x0]
	; CHECK-NEXT: str s[[UZP3]], [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cval = load <4 x i8>, <4 x i8>* %a			%cval = load <4 x i8>, <4 x i8>* %a
	%ptrs = load <4 x i8>, <4 x i8>* %b			%ptrs = load <4 x i8>, <4 x i8>* %b
	%mask = icmp eq <4 x i8> %cval, zeroinitializer			%mask = icmp eq <4 x i8> %cval, zeroinitializer
	%vals = call <4 x i8> @llvm.masked.gather.v4i8(<4 x i8*> %ptrs, i32 8, <4 x i1> %mask, <4 x i8> undef)			%vals = call <4 x i8> @llvm.masked.gather.v4i8(<4 x i8*> %ptrs, i32 8, <4 x i1> %mask, <4 x i8> undef)
	store <4 x i8> %vals, <4 x i8>* %a			store <4 x i8> %vals, <4 x i8>* %a
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; VBITS_GE_2048-NEXT: ld1b { [[VALS:z[0-9]+]].b }, [[PG0]]/z, [x0]			; VBITS_GE_2048-NEXT: ld1b { [[VALS:z[0-9]+]].b }, [[PG0]]/z, [x0]
	; VBITS_GE_2048-NEXT: ptrue [[PG1:p[0-9]+]].d, vl32			; VBITS_GE_2048-NEXT: ptrue [[PG1:p[0-9]+]].d, vl32
	; VBITS_GE_2048-NEXT: ld1d { [[PTRS:z[0-9]+]].d }, [[PG1]]/z, [x1]			; VBITS_GE_2048-NEXT: ld1d { [[PTRS:z[0-9]+]].d }, [[PG1]]/z, [x1]
	; VBITS_GE_2048-NEXT: cmpeq [[MASK:p[0-9]+]].b, [[PG0]]/z, [[VALS]].b, #0			; VBITS_GE_2048-NEXT: cmpeq [[MASK:p[0-9]+]].b, [[PG0]]/z, [[VALS]].b, #0
	; VBITS_GE_2048-NEXT: ld1b { [[RES:z[0-9]+]].d }, [[MASK]]/z, {{\[}}[[PTRS]].d]			; VBITS_GE_2048-NEXT: ld1b { [[RES:z[0-9]+]].d }, [[MASK]]/z, {{\[}}[[PTRS]].d]
	; VBITS_GE_2048-NEXT: uzp1 [[UZP1:z[0-9]+]].s, [[RES]].s, [[RES]].s			; VBITS_GE_2048-NEXT: uzp1 [[UZP1:z[0-9]+]].s, [[RES]].s, [[RES]].s
	; VBITS_GE_2048-NEXT: uzp1 [[UZP2:z[0-9]+]].h, [[UZP1]].h, [[UZP1]].h			; VBITS_GE_2048-NEXT: uzp1 [[UZP2:z[0-9]+]].h, [[UZP1]].h, [[UZP1]].h
	; VBITS_GE_2048-NEXT: uzp1 [[UZP3:z[0-9]+]].b, [[UZP2]].b, [[UZP2]].b			; VBITS_GE_2048-NEXT: uzp1 [[UZP3:z[0-9]+]].b, [[UZP2]].b, [[UZP2]].b
	; VBITS_GE_2048-NEXT: st1b { [[UZP3]].b }, [[PG0]], [x0]			; VBITS_GE_2048-NEXT: st1b { [[UZP3]].b }, [[PG0]], [x0]
	; VBITS_GE_2048-NEXT: ret			; VBITS_GE_2048-NEXT: ret
				bsmithUnsubmitted Done Reply Inline Actions This test change looks incorrect bsmith: This test change looks incorrect
	%cval = load <32 x i8>, <32 x i8>* %a			%cval = load <32 x i8>, <32 x i8>* %a
	%ptrs = load <32 x i8>, <32 x i8>* %b			%ptrs = load <32 x i8>, <32 x i8>* %b
	%mask = icmp eq <32 x i8> %cval, zeroinitializer			%mask = icmp eq <32 x i8> %cval, zeroinitializer
	%vals = call <32 x i8> @llvm.masked.gather.v32i8(<32 x i8*> %ptrs, i32 8, <32 x i1> %mask, <32 x i8> undef)			%vals = call <32 x i8> @llvm.masked.gather.v32i8(<32 x i8*> %ptrs, i32 8, <32 x i1> %mask, <32 x i8> undef)
	store <32 x i8> %vals, <32 x i8>* %a			store <32 x i8> %vals, <32 x i8>* %a
	ret void			ret void
	}			}

	;			;
	; LD1H			; LD1H
	;			;

	define void @masked_gather_v2i16(<2 x i16>* %a, <2 x i16> %b) #0 {			define void @masked_gather_v2i16(<2 x i16>* %a, <2 x i16> %b) #0 {
	; CHECK-LABEL: masked_gather_v2i16:			; CHECK-LABEL: masked_gather_v2i16:
	; CHECK: ldrh [[VALS_LO:w[0-9]+]], [x0]			; CHECK: ldrh [[VALS_LO:w[0-9]+]], [x0]
	; CHECK-NEXT: ldrh [[VALS_HI:w[0-9]+]], [x0, #2]			; CHECK-NEXT: ldrh [[VALS_HI:w[0-9]+]], [x0, #2]
	; CHECK-NEXT: ldr q[[PTRS:[0-9]+]], [x1]			; CHECK-NEXT: ldr q[[PTRS:[0-9]+]], [x1]
	; CHECK-NEXT: ptrue [[PG0:p[0-9]+]].s, vl2			; CHECK-NEXT: ptrue [[PG0:p[0-9]+]].s, vl2
	; CHECK-NEXT: fmov s[[VALS:[0-9]+]], [[VALS_LO]]			; CHECK-NEXT: fmov s[[VALS:[0-9]+]], [[VALS_LO]]
	; CHECK-NEXT: mov v[[VALS]].s[1], [[VALS_HI]]			; CHECK-NEXT: mov v[[VALS]].s[1], [[VALS_HI]]
	; CHECK-NEXT: cmeq v[[CMP:[0-9]+]].2s, v[[VALS]].2s, #0			; CHECK-NEXT: cmeq v[[CMP:[0-9]+]].2s, v[[VALS]].2s, #0
	; CHECK-NEXT: cmpne [[MASK:p[0-9]+]].s, [[PG0]]/z, z[[CMP]].s, #0			; CHECK-NEXT: cmpne [[MASK:p[0-9]+]].s, [[PG0]]/z, z[[CMP]].s, #0
	; CHECK-NEXT: ld1sh { z[[RES:[0-9]+]].d }, [[MASK]]/z, [z[[PTRS]].d]			; CHECK-NEXT: ld1sh { z[[RES:[0-9]+]].d }, [[MASK]]/z, [z[[PTRS]].d]
	; CHECK-NEXT: xtn v[[XTN:[0-9]+]].2s, v[[RES]].2d			; CHECK-NEXT: xtn v[[XTN:[0-9]+]].2s, v[[RES]].2d
	; CHECK-NEXT: mov [[RES_HI:w[0-9]+]], v[[XTN]].s[1]			; CHECK-NEXT: st1h { z[[RES]].s }, [[PG0]], [x0]
	; CHECK-NEXT: fmov [[RES_LO:w[0-9]+]], s[[XTN]]
	; CHECK-NEXT: strh [[RES_LO]], [x0]
	; CHECK-NEXT: strh [[RES_HI]], [x0, #2]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cval = load <2 x i16>, <2 x i16>* %a			%cval = load <2 x i16>, <2 x i16>* %a
	%ptrs = load <2 x i16>, <2 x i16>* %b			%ptrs = load <2 x i16>, <2 x i16>* %b
	%mask = icmp eq <2 x i16> %cval, zeroinitializer			%mask = icmp eq <2 x i16> %cval, zeroinitializer
	%vals = call <2 x i16> @llvm.masked.gather.v2i16(<2 x i16*> %ptrs, i32 8, <2 x i1> %mask, <2 x i16> undef)			%vals = call <2 x i16> @llvm.masked.gather.v2i16(<2 x i16*> %ptrs, i32 8, <2 x i1> %mask, <2 x i16> undef)
	store <2 x i16> %vals, <2 x i16>* %a			store <2 x i16> %vals, <2 x i16>* %a
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 927 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-trunc-stores.ll

This file was added.

				; RUN: llc -aarch64-sve-vector-bits-min=128 < %s \| FileCheck %s -D#VBYTES=16 -check-prefix=NO_SVE
				; RUN: llc -aarch64-sve-vector-bits-min=256 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK,VBITS_EQ_256
				; RUN: llc -aarch64-sve-vector-bits-min=384 < %s \| FileCheck %s -D#VBYTES=32 -check-prefixes=CHECK
				; RUN: llc -aarch64-sve-vector-bits-min=512 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=640 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=768 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=896 < %s \| FileCheck %s -D#VBYTES=64 -check-prefixes=CHECK,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1024 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1152 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1280 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1408 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1536 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1664 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1792 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=1920 < %s \| FileCheck %s -D#VBYTES=128 -check-prefixes=CHECK,VBITS_GE_1024,VBITS_GE_512
				; RUN: llc -aarch64-sve-vector-bits-min=2048 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_2048,VBITS_GE_1024,VBITS_GE_512

				target triple = "aarch64-unknown-linux-gnu"

				; Don't use SVE when its registers are no bigger than NEON.
				; NO_SVE-NOT: ptrue

				efriedmaUnsubmitted Not Done Reply Inline Actions Why don't we want to optimize store_trunc_v2i64i8 when SVE registers are 128 bits wide? efriedma: Why don't we want to optimize store_trunc_v2i64i8 when SVE registers are 128 bits wide?
				define void @store_trunc_v2i64i8(<2 x i64>* %ap, <2 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v2i64i8
				; CHECK: ldr q[[Q0:[0-9]+]], [x0]
				; CHECK: ptrue p[[P0:[0-9]+]].d, vl2
				; CHECK-NEXT: st1b { z[[Q0]].d }, p[[P0]], [x{{[0-9]+}}]
				; CHECK-NEXT: ret
				%a = load <2 x i64>, <2 x i64>* %ap
				%val = trunc <2 x i64> %a to <2 x i8>
				store <2 x i8> %val, <2 x i8>* %dest
				ret void
				}

				define void @store_trunc_v4i64i8(<4 x i64>* %ap, <4 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v4i64i8
				; CHECK: ptrue p[[P0:[0-9]+]].d, vl4
				; CHECK-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
				; CHECK-NEXT: st1b { z[[Q0]].d }, p[[P0]], [x{{[0-9]+}}]
				; CHECK-NEXT: ret
				%a = load <4 x i64>, <4 x i64>* %ap
				%val = trunc <4 x i64> %a to <4 x i8>
				store <4 x i8> %val, <4 x i8>* %dest
				ret void
				}

				define void @store_trunc_v8i64i8(<8 x i64>* %ap, <8 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v8i64i8:
				; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
				; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
				; VBITS_GE_512-NEXT: st1b { [[Z0]].d }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation
				; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].d, vl4
				; VBITS_EQ_256-DAG: ld1d { [[Z0:z[0-9]+]].d }, [[PG]]/z, [x8]
				; VBITS_EQ_256-DAG: ld1d { [[Z1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ptrue [[PG]].s, vl4
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].s, [[Z0]].s, [[Z0]].s
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].s, [[Z1]].s, [[Z1]].s
				; VBITS_EQ_256-DAG: splice [[Z1]].s, [[PG]], [[Z1]].s, [[Z0]].s
				; VBITS_EQ_256-DAG: ptrue [[PG]].s, vl8
				; VBITS_EQ_256-DAG: st1b { [[Z1]].s }, [[PG]], [x1]
				; VBITS_EQ_256-DAG: ret
				%a = load <8 x i64>, <8 x i64>* %ap
				%val = trunc <8 x i64> %a to <8 x i8>
				store <8 x i8> %val, <8 x i8>* %dest
				ret void
				}

				define void @store_trunc_v16i64i8(<16 x i64>* %ap, <16 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v16i64i8:
				; VBITS_GE_1024: ptrue p[[P0:[0-9]+]].d, vl16
				; VBITS_GE_1024-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
				; VBITS_GE_1024-NEXT: st1b { [[Z0]].d }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_1024-NEXT: ret
				%a = load <16 x i64>, <16 x i64>* %ap
				%val = trunc <16 x i64> %a to <16 x i8>
				store <16 x i8> %val, <16 x i8>* %dest
				ret void
				}

				define void @store_trunc_v32i64i8(<32 x i64>* %ap, <32 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v32i64i8:
				; VBITS_GE_2048: ptrue p[[P0:[0-9]+]].d, vl32
				; VBITS_GE_2048-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
				; VBITS_GE_2048-NEXT: st1b { [[Z0]].d }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_2048-NEXT: ret
				%a = load <32 x i64>, <32 x i64>* %ap
				%val = trunc <32 x i64> %a to <32 x i8>
				store <32 x i8> %val, <32 x i8>* %dest
				ret void
				}

				define void @store_trunc_v8i64i16(<8 x i64>* %ap, <8 x i16>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v8i64i16:
				; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
				; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
				; VBITS_GE_512-NEXT: st1h { [[Z0]].d }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation.
				; Currently does not use the truncating store
				; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].d, vl4
				; VBITS_EQ_256-DAG: ld1d { [[Z0:z[0-9]+]].d }, [[PG]]/z, [x8]
				; VBITS_EQ_256-DAG: ld1d { [[Z1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].s, [[Z0]].s, [[Z0]].s
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].s, [[Z1]].s, [[Z1]].s
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
				; VBITS_EQ_256-DAG: mov v[[V0:[0-9]+]].d[1], v{{[0-9]+}}.d[0]
				; VBITS_EQ_256-DAG: str q[[V0]], [x1]
				; VBITS_EQ_256-DAG: ret
				%a = load <8 x i64>, <8 x i64>* %ap
				%val = trunc <8 x i64> %a to <8 x i16>
				store <8 x i16> %val, <8 x i16>* %dest
				ret void
				}

				define void @store_trunc_v8i64i32(<8 x i64>* %ap, <8 x i32>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v8i64i32:
				; VBITS_GE_512: ptrue p[[P0:[0-9]+]].d, vl8
				; VBITS_GE_512-NEXT: ld1d { [[Z0:z[0-9]+]].d }, p0/z, [x0]
				; VBITS_GE_512-NEXT: st1w { [[Z0]].d }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation
				; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].d, vl4
				; VBITS_EQ_256-DAG: ld1d { [[Z0:z[0-9]+]].d }, [[PG]]/z, [x8]
				; VBITS_EQ_256-DAG: ld1d { [[Z1:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ptrue [[PG]].s, vl4
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].s, [[Z0]].s, [[Z0]].s
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].s, [[Z1]].s, [[Z1]].s
				; VBITS_EQ_256-DAG: splice [[Z1]].s, [[PG]], [[Z1]].s, [[Z0]].s
				; VBITS_EQ_256-DAG: ptrue [[PG]].s, vl8
				; VBITS_EQ_256-DAG: st1w { [[Z1]].s }, [[PG]], [x1]
				; VBITS_EQ_256-DAG: ret
				%a = load <8 x i64>, <8 x i64>* %ap
				%val = trunc <8 x i64> %a to <8 x i32>
				store <8 x i32> %val, <8 x i32>* %dest
				ret void
				}

				define void @store_trunc_v16i32i8(<16 x i32>* %ap, <16 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v16i32i8:
				; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16
				; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]
				; VBITS_GE_512-NEXT: st1b { [[Z0]].s }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation.
				; Currently does not use the truncating store
				; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].s, vl8
				; VBITS_EQ_256-DAG: ld1w { [[Z0:z[0-9]+]].s }, [[PG]]/z, [x8]
				; VBITS_EQ_256-DAG: ld1w { [[Z1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].b, [[Z1]].b, [[Z1]].b
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].b, [[Z0]].b, [[Z0]].b
				; VBITS_EQ_256-DAG: mov v[[V0:[0-9]+]].d[1], v{{[0-9]+}}.d[0]
				; VBITS_EQ_256-DAG: str q[[V0]], [x1]
				; VBITS_EQ_256-DAG: ret
				%a = load <16 x i32>, <16 x i32>* %ap
				%val = trunc <16 x i32> %a to <16 x i8>
				store <16 x i8> %val, <16 x i8>* %dest
				ret void
				}

				define void @store_trunc_v16i32i16(<16 x i32>* %ap, <16 x i16>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v16i32i16:
				; VBITS_GE_512: ptrue p[[P0:[0-9]+]].s, vl16
				; VBITS_GE_512-NEXT: ld1w { [[Z0:z[0-9]+]].s }, p0/z, [x0]
				; VBITS_GE_512-NEXT: st1h { [[Z0]].s }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation
				; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].s, vl8
				; VBITS_EQ_256-DAG: ld1w { [[Z0:z[0-9]+]].s }, [[PG]]/z, [x8]
				; VBITS_EQ_256-DAG: ld1w { [[Z1:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ptrue [[PG]].h, vl8
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].h, [[Z0]].h, [[Z0]].h
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].h, [[Z1]].h, [[Z1]].h
				; VBITS_EQ_256-DAG: splice [[Z1]].h, [[PG]], [[Z1]].h, [[Z0]].h
				; VBITS_EQ_256-DAG: ptrue [[PG]].h, vl16
				efriedmaUnsubmitted Not Done Reply Inline Actions Can we "uzp1 z0, z0, z1" or something like that in store_trunc_v16i32i16? efriedma: Can we "uzp1 z0, z0, z1" or something like that in store_trunc_v16i32i16?
				DavidTrubyAuthorUnsubmitted Done Reply Inline Actions The code generation of these legalized types is poor and needs fixing in a separate patch, as these should really be using truncating stores as well. These tests just check that legalised code is correct rather than good at the moment. DavidTruby: The code generation of these legalized types is poor and needs fixing in a separate patch, as…
				; VBITS_EQ_256-DAG: st1h { [[Z1]].h }, [[PG]], [x1]
				; VBITS_EQ_256-DAG: ret
				%a = load <16 x i32>, <16 x i32>* %ap
				%val = trunc <16 x i32> %a to <16 x i16>
				store <16 x i16> %val, <16 x i16>* %dest
				ret void
				}

				define void @store_trunc_v32i16i8(<32 x i16>* %ap, <32 x i8>* %dest) #0 {
				; CHECK-LABEL: store_trunc_v32i16i8:
				; VBITS_GE_512: ptrue p[[P0:[0-9]+]].h, vl32
				; VBITS_GE_512-NEXT: ld1h { [[Z0:z[0-9]+]].h }, p0/z, [x0]
				; VBITS_GE_512-NEXT: st1b { [[Z0]].h }, p[[P0]], [x{{[0-9]+}}]
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation
				; VBITS_EQ_256-DAG: ptrue [[PG:p[0-9]+]].h, vl16
				; VBITS_EQ_256-DAG: ld1h { [[Z0:z[0-9]+]].h }, [[PG]]/z, [x8]
				; VBITS_EQ_256-DAG: ld1h { [[Z1:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ptrue [[PG]].b, vl16
				; VBITS_EQ_256-DAG: uzp1 [[Z0]].b, [[Z0]].b, [[Z0]].b
				; VBITS_EQ_256-DAG: uzp1 [[Z1]].b, [[Z1]].b, [[Z1]].b
				; VBITS_EQ_256-DAG: splice [[Z1]].b, [[PG]], [[Z1]].b, [[Z0]].b
				; VBITS_EQ_256-DAG: ptrue [[PG]].b, vl32
				; VBITS_EQ_256-DAG: st1b { [[Z1]].b }, [[PG]], [x1]
				; VBITS_EQ_256-DAG: ret
				%a = load <32 x i16>, <32 x i16>* %ap
				%val = trunc <32 x i16> %a to <32 x i8>
				store <32 x i8> %val, <32 x i8>* %dest
				ret void
				}


				attributes #0 = { "target-features"="+sve" }

llvm/test/CodeGen/Mips/cconv/byval.ll

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	; N32-NEXT: addu $1, $sp, $1			; N32-NEXT: addu $1, $sp, $1
	; N32-NEXT: sd $ra, 8($1) # 8-byte Folded Spill			; N32-NEXT: sd $ra, 8($1) # 8-byte Folded Spill
	; N32-NEXT: lui $1, 1			; N32-NEXT: lui $1, 1
	; N32-NEXT: addu $1, $sp, $1			; N32-NEXT: addu $1, $sp, $1
	; N32-NEXT: sd $16, 0($1) # 8-byte Folded Spill			; N32-NEXT: sd $16, 0($1) # 8-byte Folded Spill
	; N32-NEXT: .cfi_offset 31, -8			; N32-NEXT: .cfi_offset 31, -8
	; N32-NEXT: .cfi_offset 16, -16			; N32-NEXT: .cfi_offset 16, -16
	; N32-NEXT: move $5, $4			; N32-NEXT: move $5, $4
	; N32-NEXT: sll $1, $5, 0			; N32-NEXT: lui $1, 1
	; N32-NEXT: lui $2, 1			; N32-NEXT: addu $1, $sp, $1
	; N32-NEXT: addu $2, $sp, $2			; N32-NEXT: sw $4, -4($1)
	; N32-NEXT: sw $1, -4($2)
	; N32-NEXT: addiu $16, $sp, 8			; N32-NEXT: addiu $16, $sp, 8
	; N32-NEXT: ori $6, $zero, 65520			; N32-NEXT: ori $6, $zero, 65520
	; N32-NEXT: jal memcpy			; N32-NEXT: jal memcpy
	; N32-NEXT: move $4, $16			; N32-NEXT: move $4, $16
	; N32-NEXT: addiu $5, $16, 64			; N32-NEXT: addiu $5, $16, 64
	; N32-NEXT: ori $1, $zero, 65456			; N32-NEXT: ori $1, $zero, 65456
	; N32-NEXT: subu $sp, $sp, $1			; N32-NEXT: subu $sp, $sp, $1
	; N32-NEXT: ori $6, $zero, 65456			; N32-NEXT: ori $6, $zero, 65456
	▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	; O32-NEXT: addiu $sp, $sp, 32			; O32-NEXT: addiu $sp, $sp, 32
	;			;
	; N32-LABEL: g3:			; N32-LABEL: g3:
	; N32: # %bb.0: # %entry			; N32: # %bb.0: # %entry
	; N32-NEXT: addiu $sp, $sp, -16			; N32-NEXT: addiu $sp, $sp, -16
	; N32-NEXT: .cfi_def_cfa_offset 16			; N32-NEXT: .cfi_def_cfa_offset 16
	; N32-NEXT: sd $ra, 8($sp) # 8-byte Folded Spill			; N32-NEXT: sd $ra, 8($sp) # 8-byte Folded Spill
	; N32-NEXT: .cfi_offset 31, -8			; N32-NEXT: .cfi_offset 31, -8
	; N32-NEXT: sll $1, $5, 0			; N32-NEXT: sw $5, 0($sp)
	; N32-NEXT: sw $1, 0($sp)			; N32-NEXT: sw $4, 4($sp)
	; N32-NEXT: sll $1, $4, 0
	; N32-NEXT: sw $1, 4($sp)
	; N32-NEXT: jal memcpy			; N32-NEXT: jal memcpy
	; N32-NEXT: ori $6, $zero, 65520			; N32-NEXT: ori $6, $zero, 65520
	; N32-NEXT: addiu $2, $zero, 4			; N32-NEXT: addiu $2, $zero, 4
	; N32-NEXT: ld $ra, 8($sp) # 8-byte Folded Reload			; N32-NEXT: ld $ra, 8($sp) # 8-byte Folded Reload
	; N32-NEXT: jr $ra			; N32-NEXT: jr $ra
	; N32-NEXT: addiu $sp, $sp, 16			; N32-NEXT: addiu $sp, $sp, 16
	;			;
	; N64-LABEL: g3:			; N64-LABEL: g3:
	Show All 27 Lines

llvm/test/CodeGen/Mips/cconv/vector.ll

	Show First 20 Lines • Show All 548 Lines • ▼ Show 20 Lines
	; MIPS32R5-NEXT: addiu $sp, $sp, 16			; MIPS32R5-NEXT: addiu $sp, $sp, 16
	; MIPS32R5-NEXT: jr $ra			; MIPS32R5-NEXT: jr $ra
	; MIPS32R5-NEXT: nop			; MIPS32R5-NEXT: nop
	;			;
	; MIPS64R5-LABEL: i8_4:			; MIPS64R5-LABEL: i8_4:
	; MIPS64R5: # %bb.0:			; MIPS64R5: # %bb.0:
	; MIPS64R5-NEXT: daddiu $sp, $sp, -16			; MIPS64R5-NEXT: daddiu $sp, $sp, -16
	; MIPS64R5-NEXT: .cfi_def_cfa_offset 16			; MIPS64R5-NEXT: .cfi_def_cfa_offset 16
	; MIPS64R5-NEXT: sll $1, $5, 0			; MIPS64R5-NEXT: sw $5, 8($sp)
	; MIPS64R5-NEXT: sw $1, 8($sp)			; MIPS64R5-NEXT: sw $4, 12($sp)
	; MIPS64R5-NEXT: sll $1, $4, 0
	; MIPS64R5-NEXT: sw $1, 12($sp)
	; MIPS64R5-NEXT: lbu $1, 9($sp)			; MIPS64R5-NEXT: lbu $1, 9($sp)
	; MIPS64R5-NEXT: lbu $2, 8($sp)			; MIPS64R5-NEXT: lbu $2, 8($sp)
	; MIPS64R5-NEXT: insert.w $w0[0], $2			; MIPS64R5-NEXT: insert.w $w0[0], $2
	; MIPS64R5-NEXT: insert.w $w0[1], $1			; MIPS64R5-NEXT: insert.w $w0[1], $1
	; MIPS64R5-NEXT: lbu $1, 10($sp)			; MIPS64R5-NEXT: lbu $1, 10($sp)
	; MIPS64R5-NEXT: insert.w $w0[2], $1			; MIPS64R5-NEXT: insert.w $w0[2], $1
	; MIPS64R5-NEXT: lbu $1, 11($sp)			; MIPS64R5-NEXT: lbu $1, 11($sp)
	; MIPS64R5-NEXT: insert.w $w0[3], $1			; MIPS64R5-NEXT: insert.w $w0[3], $1
	▲ Show 20 Lines • Show All 689 Lines • ▼ Show 20 Lines
	; MIPS32R5EB-NEXT: addiu $sp, $sp, 64			; MIPS32R5EB-NEXT: addiu $sp, $sp, 64
	; MIPS32R5EB-NEXT: jr $ra			; MIPS32R5EB-NEXT: jr $ra
	; MIPS32R5EB-NEXT: nop			; MIPS32R5EB-NEXT: nop
	;			;
	; MIPS64R5-LABEL: i16_2:			; MIPS64R5-LABEL: i16_2:
	; MIPS64R5: # %bb.0:			; MIPS64R5: # %bb.0:
	; MIPS64R5-NEXT: daddiu $sp, $sp, -16			; MIPS64R5-NEXT: daddiu $sp, $sp, -16
	; MIPS64R5-NEXT: .cfi_def_cfa_offset 16			; MIPS64R5-NEXT: .cfi_def_cfa_offset 16
	; MIPS64R5-NEXT: sll $1, $5, 0			; MIPS64R5-NEXT: sw $5, 8($sp)
	; MIPS64R5-NEXT: sw $1, 8($sp)			; MIPS64R5-NEXT: sw $4, 12($sp)
	; MIPS64R5-NEXT: sll $1, $4, 0
	; MIPS64R5-NEXT: sw $1, 12($sp)
	; MIPS64R5-NEXT: lh $1, 10($sp)			; MIPS64R5-NEXT: lh $1, 10($sp)
	; MIPS64R5-NEXT: lh $2, 8($sp)			; MIPS64R5-NEXT: lh $2, 8($sp)
	; MIPS64R5-NEXT: insert.d $w0[0], $2			; MIPS64R5-NEXT: insert.d $w0[0], $2
	; MIPS64R5-NEXT: insert.d $w0[1], $1			; MIPS64R5-NEXT: insert.d $w0[1], $1
	; MIPS64R5-NEXT: lh $1, 14($sp)			; MIPS64R5-NEXT: lh $1, 14($sp)
	; MIPS64R5-NEXT: lh $2, 12($sp)			; MIPS64R5-NEXT: lh $2, 12($sp)
	; MIPS64R5-NEXT: insert.d $w1[0], $2			; MIPS64R5-NEXT: insert.d $w1[0], $2
	; MIPS64R5-NEXT: insert.d $w1[1], $1			; MIPS64R5-NEXT: insert.d $w1[1], $1
	▲ Show 20 Lines • Show All 5,739 Lines • Show Last 20 Lines

llvm/test/CodeGen/Mips/llvm-ir/store.ll

	Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
	; MMR6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MMR6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MMR6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MMR6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MMR6-NEXT: # <MCOperand Expr:(%lo(c))>>			; MMR6-NEXT: # <MCOperand Expr:(%lo(c))>>
	; MMR6-NEXT: jrc $ra # <MCInst #{{[0-9]+}} JRC16_MM			; MMR6-NEXT: jrc $ra # <MCInst #{{[0-9]+}} JRC16_MM
	; MMR6-NEXT: # <MCOperand Reg:{{[0-9]+}}>>			; MMR6-NEXT: # <MCOperand Reg:{{[0-9]+}}>>
	;			;
	; MIPS4-LABEL: f3:			; MIPS4-LABEL: f3:
	; MIPS4: # %bb.0:			; MIPS4: # %bb.0:
	; MIPS4-NEXT: sll $1, $4, 0 # <MCInst #{{[0-9]+}} SLL			; MIPS4-NEXT: lui $1, %highest(c) # <MCInst #{{[0-9]+}} LUi64
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Imm:0>>
	; MIPS4-NEXT: lui $2, %highest(c) # <MCInst #{{[0-9]+}} LUi64
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Expr:(%highest(c))>>			; MIPS4-NEXT: # <MCOperand Expr:(%highest(c))>>
	; MIPS4-NEXT: daddiu $2, $2, %higher(c) # <MCInst #{{[0-9]+}} DADDiu			; MIPS4-NEXT: daddiu $1, $1, %higher(c) # <MCInst #{{[0-9]+}} DADDiu
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Expr:(%higher(c))>>			; MIPS4-NEXT: # <MCOperand Expr:(%higher(c))>>
	; MIPS4-NEXT: dsll $2, $2, 16 # <MCInst #{{[0-9]+}} DSLL			; MIPS4-NEXT: dsll $1, $1, 16 # <MCInst #{{[0-9]+}} DSLL
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Imm:16>>			; MIPS4-NEXT: # <MCOperand Imm:16>>
	; MIPS4-NEXT: daddiu $2, $2, %hi(c) # <MCInst #{{[0-9]+}} DADDiu			; MIPS4-NEXT: daddiu $1, $1, %hi(c) # <MCInst #{{[0-9]+}} DADDiu
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Expr:(%hi(c))>>			; MIPS4-NEXT: # <MCOperand Expr:(%hi(c))>>
	; MIPS4-NEXT: dsll $2, $2, 16 # <MCInst #{{[0-9]+}} DSLL			; MIPS4-NEXT: dsll $1, $1, 16 # <MCInst #{{[0-9]+}} DSLL
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Imm:16>>			; MIPS4-NEXT: # <MCOperand Imm:16>>
	; MIPS4-NEXT: jr $ra # <MCInst #{{[0-9]+}} JR			; MIPS4-NEXT: jr $ra # <MCInst #{{[0-9]+}} JR
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>>
	; MIPS4-NEXT: sw $1, %lo(c)($2) # <MCInst #{{[0-9]+}} SW			; MIPS4-NEXT: sw $4, %lo(c)($1) # <MCInst #{{[0-9]+}} SW
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS4-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS4-NEXT: # <MCOperand Expr:(%lo(c))>>			; MIPS4-NEXT: # <MCOperand Expr:(%lo(c))>>
	;			;
	; MIPS64R6-LABEL: f3:			; MIPS64R6-LABEL: f3:
	; MIPS64R6: # %bb.0:			; MIPS64R6: # %bb.0:
	; MIPS64R6-NEXT: sll $1, $4, 0 # <MCInst #{{[0-9]+}} SLL			; MIPS64R6-NEXT: lui $1, %highest(c) # <MCInst #{{[0-9]+}} LUi64
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Imm:0>>
	; MIPS64R6-NEXT: lui $2, %highest(c) # <MCInst #{{[0-9]+}} LUi64
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Expr:(%highest(c))>>			; MIPS64R6-NEXT: # <MCOperand Expr:(%highest(c))>>
	; MIPS64R6-NEXT: daddiu $2, $2, %higher(c) # <MCInst #{{[0-9]+}} DADDiu			; MIPS64R6-NEXT: daddiu $1, $1, %higher(c) # <MCInst #{{[0-9]+}} DADDiu
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Expr:(%higher(c))>>			; MIPS64R6-NEXT: # <MCOperand Expr:(%higher(c))>>
	; MIPS64R6-NEXT: dsll $2, $2, 16 # <MCInst #{{[0-9]+}} DSLL			; MIPS64R6-NEXT: dsll $1, $1, 16 # <MCInst #{{[0-9]+}} DSLL
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Imm:16>>			; MIPS64R6-NEXT: # <MCOperand Imm:16>>
	; MIPS64R6-NEXT: daddiu $2, $2, %hi(c) # <MCInst #{{[0-9]+}} DADDiu			; MIPS64R6-NEXT: daddiu $1, $1, %hi(c) # <MCInst #{{[0-9]+}} DADDiu
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Expr:(%hi(c))>>			; MIPS64R6-NEXT: # <MCOperand Expr:(%hi(c))>>
	; MIPS64R6-NEXT: dsll $2, $2, 16 # <MCInst #{{[0-9]+}} DSLL			; MIPS64R6-NEXT: dsll $1, $1, 16 # <MCInst #{{[0-9]+}} DSLL
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Imm:16>>			; MIPS64R6-NEXT: # <MCOperand Imm:16>>
	; MIPS64R6-NEXT: jr $ra # <MCInst #{{[0-9]+}} JALR64			; MIPS64R6-NEXT: jr $ra # <MCInst #{{[0-9]+}} JALR64
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>>
	; MIPS64R6-NEXT: sw $1, %lo(c)($2) # <MCInst #{{[0-9]+}} SW			; MIPS64R6-NEXT: sw $4, %lo(c)($1) # <MCInst #{{[0-9]+}} SW
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>			; MIPS64R6-NEXT: # <MCOperand Reg:{{[0-9]+}}>
	; MIPS64R6-NEXT: # <MCOperand Expr:(%lo(c))>>			; MIPS64R6-NEXT: # <MCOperand Expr:(%lo(c))>>
	store i32 %a, i32 * @c			store i32 %a, i32 * @c
	ret void			ret void
	}			}

	define void @f4(i64 %a) {			define void @f4(i64 %a) {
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][sve] Lowering for VLS truncating storesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 357880

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AMDGPU/R600ISelLowering.h

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll

llvm/test/CodeGen/AArch64/sve-fixed-length-trunc-stores.ll

llvm/test/CodeGen/Mips/cconv/byval.ll

llvm/test/CodeGen/Mips/cconv/vector.ll

llvm/test/CodeGen/Mips/llvm-ir/store.ll

[llvm][sve] Lowering for VLS truncating stores
ClosedPublic