This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeVectorOps.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
5/6
AArch64ISelLowering.cpp
1/1
AArch64TargetTransformInfo.h
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/3
sve-fixed-length-fp-reduce.ll

Differential D89162

[SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
ClosedPublic

Authored by cameron.mcinally on Oct 9 2020, 2:00 PM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
kmclaughlin
sdesmalen
efriedma
nikic

Commits

rGa1cc274cb35f: [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation

Summary

@paulwalker-arm, I'm guessing this hits the reduction legalisation problems that @kmclaughlin is working on? This patch currently falls apart during splitting of the reduction nodes. If not, I'll continue to build it out.

Some other notes:

It looks like NEON FADDA support is missing upstream too.

We'll likely need to change how the OperationAction types are determined for the reduction-with-accumulator ISD nodes (see LegalizeDAG.cpp change). The types are currently based off the start_value type, not the vector op type. @sdesmalen

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cameron.mcinally created this revision.Oct 9 2020, 2:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2020, 2:00 PM

Herald added subscribers: llvm-commits, psnobl, hiraditya and 2 others. · View Herald Transcript

cameron.mcinally requested review of this revision.Oct 9 2020, 2:00 PM

Harbormaster completed remote builds in B74657: Diff 297329.Oct 9 2020, 2:01 PM

Are you planning to add support for the normal VECREDUCE_FADD? I ask because that's likely to see more initial use upstream than the SEQ variant.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1154–1158 ↗	(On Diff #297329)	You'll need to move these below the main VECREDUCE_ options because VECREDUCE_FADD & VECREDUCE_FMUL only take a single operand.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1132–1135	I doubt it's worth custom lowering `v1f64` being the expansion will be a single neon ADD. Or is this because there is current no expand code of the SEQ reductions? What about operations on vectors of `f16`.
16270	Given there's likely to be only one use of this I think calling this `LowerVECREDUCE_SEQ_FADD` is more inline with the naming of the other lowering functions.
16281–16284	Looks like we're missing some obvious patterns. I've created D89235 to add them.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
226–227	This'll need to be more verbose because having SVE is only safe for scalable vectors. You'll need to look at the useSVEForFixedLengthVectorVT question if II is operating on fixed length types.
llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
57	Looks like a stray X here.

You're right on all the comments. I stopped midway when I hit the legalisation issue, so that's why this patch is rough. That would need to be built out first before this work could continue. And I think NEON support should be built out before that. It sounds like I'm not stepping on toes, so I'll go in that order. Thanks.

Oh, and I can do VECREDUCE_FADD first. Just hit VECREDUCE_SEQ_FADD before I realized I need to add 'fast' to the intrinsic calls.

This is ready for review now...

Notice that I made useSVEForFixedLengthVectors*(...) public members so that the ExpandReductions pass can access them. I don't keep up with C++ best practices, so I'm not sure if there's a better way to grant access to these functions. Any suggestions?

Also, expansion is kind of ugly for this operation. The ExpandReduction pass is an IR pass that runs before legalization, so the decision to expand is done early. It would be possible to not rely on ExpandReductions, and rather extend normal legalization to do an ordered lowering, but that would be duplicating what is already done. Does anyone feel strongly that expansion should be moved to Legalize or otherwise?

Some other notes:

It looks like NEON FADDA support is missing upstream too.

I made a mistake here. I don't see an FADDA instruction in the NEON ISA, only SVE. Not sure why I thought otherwise...

cameron.mcinally marked 5 inline comments as done.Oct 19 2020, 9:13 AM

cameron.mcinally added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16270	I moved `LowerVECREDUCE_SEQ_FADD(...)` out from under `LowerVECREDUCE(...)`. Not sure if that helps or hurts @kmclaughlin, but I can change it if there's a problem.

Fix 80 column issue. No other changes intended...

Try again with 80 column fix...

Sorry @cameron.mcinally I've not had much time for code reviews this week although will take proper look tomorrow. I have a question though. You've added extra legalisation support but I don't see any explicit tests (or at least ones with matching check lines) for it. Is this something you need for this patch? (I'm guessing sve-fixed-length-fp-reduce.ll's stock NEON run line triggers the cases?) If so then there really should be a neon specific test file that verifies the widening and scalarisation changes as the NEON run line for the "fixed-length" tests is more about ensuring no SVE instructions slip through.

In D89162#2347701, @paulwalker-arm wrote:

Sorry @cameron.mcinally I've not had much time for code reviews this week although will take proper look tomorrow. I have a question though. You've added extra legalisation support but I don't see any explicit tests (or at least ones with matching check lines) for it. Is this something you need for this patch? (I'm guessing sve-fixed-length-fp-reduce.ll's stock NEON run line triggers the cases?) If so then there really should be a neon specific test file that verifies the widening and scalarisation changes as the NEON run line for the "fixed-length" tests is more about ensuring no SVE instructions slip through.

Thanks, Paul. No rush at all. Digressing a bit, we should probably sync-up on what else needs to be done for fixed length lowering. I believe this is the last operation on my list.

The legalisation changes are specific to VECREDUCE_SEQ_FADD on SVE. Prior to this patch, VECREDUCE_SEQ_FADD is expanded in the ExpandReductions pass before legalisation runs, so VECREDUCE_SEQ_FADD would never reach legalisation on NEON. (That's assuming NEON has no FADDA support. I might be wrong about that.)

The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

In D89162#2347814, @cameron.mcinally wrote:

The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

Are you sure? I took your patch for a test drive and removed all but the TLI.getOperationAction related change from Legalize*.{cpp, h} and the tests passed.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16293	I think this can be ContainerVT?
16297	You use this in a couple of places, so might be worth extracting into `Zero =` ?
llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
65	What does this mean? I was going to say you're missing VBITS_EQ_256 lines here but then noticed the output. What's you're feeling on this? I see two routes: Stick with the expanded output for this patch, in which case just having `VBITS_EQ_256-COUNT-33: fadd` is good enough. For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard. `VECREDUCE_SEQ_FADD A, X_512 => VECREDUCE_SEQ_FADD (VECREDUCE_SEQ_FADD X, X_LO), X_HI` ? I'm happy with either approach.

In D89162#2350199, @paulwalker-arm wrote:

In D89162#2347814, @cameron.mcinally wrote:

The new tests would be broken without the legalisation changes, so I'm assuming that those are enough coverage. Maybe I'm missing something though...

Are you sure? I took your patch for a test drive and removed all but the TLI.getOperationAction related change from Legalize*.{cpp, h} and the tests passed.

That doesn't sound right. E.g. we need the scalarization code for <1 x f64>. I'll check again...

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll
65	For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard. Oh, you're right. I initially thought we couldn't preserve the ordering constraint if we split, but we could use the accumulator to chain them. Makes sense. I'll look into that...

This should be split into two parts:

A patch that implements all the necessary legalizations for VECREDUCE_SEQ_* and enables use of SDAG legalization on AArch64 (and possibly ARM) unconditionally. This may need some additional test coverage to trigger all possible legalizations. Necessary legalizations also include float legalization, not just vector legalization.
A patch that adds improved lowerings using SVE instructions based on that.

This revision now requires changes to proceed.Oct 23 2020, 8:16 AM

@nikic Why does wanting to implement VECREDUCE_SEQ_FADD for SVE necessitate having to implement full support for NEON (AArch64 and Arm)?

@paulwalker-arm There is no need to implement any target-specific support. The legalization outcome will be a simple chain of extracts and fadd/fmuls. It does not need to generate good code, just not assert for any VTs.

In D89162#2350313, @nikic wrote:

@paulwalker-arm There is no need to implement any target-specific support. The legalization outcome will be a simple chain of extracts and fadd/fmuls. It does not need to generate good code, just not assert for any VTs.

Sure, I think there's been a misunderstanding. The only target-specific piece we're talking about is the existing shouldExpandReduction hook that controls if the code generator will see the VECREDUCE_SEQ_FADD to legalise. So I'm wondering if an acceptable ordering is to allow the lowering of legal VECREDUCE_SEQ_FADD operations (which is only what shouldExpandReduction will let through) and then we can tackle the type legalisation problem second so that shouldExpandReduction can let everything through when SVE is enable (of which they'll already be a test ready and waiting) and leave NEON as it is today.

Updating patch, but not ready for a serious review yet as I haven't started the splitting work. I'm still not convinced we can handle splitting appropriately with the current setup, but will comment on that seperately.

I caused a big misunderstanding here, so let me try to unwind it. Paul is correct. The legalizations are NOT necessary for SVE support. I made a mistake when preparing this patch. I.e. I built the legalization changes out before I finalized the shouldExpandReduction(...) changes, so ended up in a weird state with NEON.

The ExpandReductions pass is somewhat unusual and that threw me off. It's an IR pass that runs early in llc, before legalisation. I've never seen one of those, so it tripped me up. I apologize for the confusion. That said, I wonder if the reductions should really be expanded during legalization. That would have been a more natural choice to me.

@cameron.mcinally: I'm sure you know this but just in case it saves some time I can confirm you will not hit the splitting code until after you relax shouldExpandReduction. So I think your current patch is complete so it really comes down to whether adding the support this way round (i.e. only allow legal types for SVE, then allow all types and implement the legalisation) is acceptable to @nikic .

Thanks for the clarification, I indeed misunderstood this. If you have a way to accurately pick out the legal types, then this is of course fine. My only concern was to not introduce incomplete legalization support for these opcodes.

The ExpandReductions pass is somewhat unusual and that threw me off. It's an IR pass that runs early in llc, before legalisation. I've never seen one of those, so it tripped me up. I apologize for the confusion. That said, I wonder if the reductions should really be expanded during legalization. That would have been a more natural choice to me.

The eventual goal here is to expand reductions during legalization only. The IR expansion exists because the DAG legalization support has been patchy historically, with VECREDUCE_SEQ_* being the last remaining hole.

In D89162#2350619, @nikic wrote:

The eventual goal here is to expand reductions during legalization only. The IR expansion exists because the DAG legalization support has been patchy historically, with VECREDUCE_SEQ_* being the last remaining hole.

Ah, ok. I'm happy to hear we're all in agreement. I might be able to volunteer some time for that work, but it's not really my prime directive.

Again, apologies for the mix up. There were a lot of moving parts for this patch and I dropped the ball. *^_^*

@paulwalker-arm, back to the splitting discussion...

Looks like we need to wait for the full VECREDUCE_SEQ_FADD legalization changes before the splitting work can be done. We could make changes to shouldExpandReduction(), to recognize large vectors that could be split, but I suspect that code will end up too hacky.

Unless I'm missing something. What are your thoughts?

With this patch[1] landed I believe operation legalisation is now a solved problem for SVE (well for fixed length vectors). I think it's worth tackling the type legalisation side of things rather than overcomplicating shouldExpandReduction. Of course that's easy for me to say given it's your time :) but you've already done part of the work based on the older patch.

Your WidenVecOp_VECREDUCE changes looked good and we've already discussed how splitting can be done in a generic way. With these two pieces I think shouldExpandReduction just needs to test for SVE.

When that lands we hit another decision point to decide whether it's worth avoiding SVE for very small vectors and thus whether to implement the scalarisation.

[1] I just wanted to highlight my previous VBITS_EQ_256-COUNT-33: fadd comment as this gives us a bit more test coverage and is something that will obviously fail (in a good way) when the splitting work is available.

This revision is now accepted and ready to land.Oct 23 2020, 11:53 AM

Closed by commit rGa1cc274cb35f: [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation (authored by cameron.mcinally). · Explain WhyOct 23 2020, 2:24 PM

This revision was automatically updated to reflect the committed changes.

cameron.mcinally added a commit: rGa1cc274cb35f: [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation.

[1] I just wanted to highlight my previous VBITS_EQ_256-COUNT-33: fadd comment as this gives us a bit more test coverage and is something that will obviously fail (in a good way) when the splitting work is available.

Got it. Also added a FIXME to remind us to move the useSVEForFixedLengthVectors*() functions back to private scope when the reduction legalization is complete.

cameron.mcinally mentioned this in D90247: [AArch64] Add legalizations for VECREDUCE_SEQ_FADD .Oct 27 2020, 9:39 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

LegalizeVectorOps.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.h

15 lines

AArch64ISelLowering.cpp

38 lines

AArch64TargetTransformInfo.h

10 lines

test/

CodeGen/

AArch64/

sve-fixed-length-fp-reduce.ll

208 lines

Diff 300412

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
case ISD::SDIVFIXSAT:		case ISD::SDIVFIXSAT:
case ISD::UDIVFIX:		case ISD::UDIVFIX:
case ISD::UDIVFIXSAT: {		case ISD::UDIVFIXSAT: {
unsigned Scale = Node->getConstantOperandVal(2);		unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),		Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);		Node->getValueType(0), Scale);
break;		break;
}		}
		case ISD::VECREDUCE_SEQ_FADD:
		Action = TLI.getOperationAction(Node->getOpcode(),
		Node->getOperand(1).getValueType());
		break;
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
▲ Show 20 Lines • Show All 983 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 774 Lines • ▼ Show 20 Lines	public:
/// merge. However, merging them creates a BUILD_VECTOR that is just as		/// merge. However, merging them creates a BUILD_VECTOR that is just as
/// illegal as the original, thus leading to an infinite legalisation loop.		/// illegal as the original, thus leading to an infinite legalisation loop.
/// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal		/// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal
/// vector types this override can be removed.		/// vector types this override can be removed.
bool mergeStoresAfterLegalization(EVT VT) const override {		bool mergeStoresAfterLegalization(EVT VT) const override {
return !useSVEForFixedLengthVectors();		return !useSVEForFixedLengthVectors();
}		}

		// FIXME: Move useSVEForFixedLengthVectors*() back to private scope once
		// reduction legalization is complete.
		bool useSVEForFixedLengthVectors() const;
		// Normally SVE is only used for byte size vectors that do not fit within a
		// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
		// used for 64bit and 128bit vectors as well.
		bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;

private:		private:
/// Keep a pointer to the AArch64Subtarget around so that we can		/// Keep a pointer to the AArch64Subtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const AArch64Subtarget *Subtarget;		const AArch64Subtarget *Subtarget;

bool isExtFreeImpl(const Instruction *Ext) const override;		bool isExtFreeImpl(const Instruction *Ext) const override;

void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);		void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	private:
SDValue LowerSVEStructLoad(unsigned Intrinsic, ArrayRef<SDValue> LoadOps,		SDValue LowerSVEStructLoad(unsigned Intrinsic, ArrayRef<SDValue> LoadOps,
EVT VT, SelectionDAG &DAG, const SDLoc &DL) const;		EVT VT, SelectionDAG &DAG, const SDLoc &DL) const;

SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntDivideToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,		SDValue LowerFixedLengthVectorIntExtendToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorLoadToSVE(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp, SelectionDAG &DAG) const;
SDValue LowerFixedLengthReductionToSVE(unsigned Opcode, SDValue ScalarOp,		SDValue LowerFixedLengthReductionToSVE(unsigned Opcode, SDValue ScalarOp,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorSelectToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorSelectToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorSetccToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorSetccToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorStoreToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthVectorStoreToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthVectorTruncateToSVE(SDValue Op,		SDValue LowerFixedLengthVectorTruncateToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	void ReplaceExtractSubVectorResults(SDNode *N,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;		bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;

void finalizeLowering(MachineFunction &MF) const override;		void finalizeLowering(MachineFunction &MF) const override;

bool shouldLocalize(const MachineInstr &MI,		bool shouldLocalize(const MachineInstr &MI,
const TargetTransformInfo *TTI) const override;		const TargetTransformInfo *TTI) const override;

bool useSVEForFixedLengthVectors() const;
// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.
bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;
};		};

namespace AArch64 {		namespace AArch64 {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace AArch64		} // end namespace AArch64

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,113 Lines • ▼ Show 20 Lines	if (useSVEForFixedLengthVectors()) {
setOperationAction(ISD::UMAX, MVT::v1i64, Custom);		setOperationAction(ISD::UMAX, MVT::v1i64, Custom);
setOperationAction(ISD::UMAX, MVT::v2i64, Custom);		setOperationAction(ISD::UMAX, MVT::v2i64, Custom);
setOperationAction(ISD::UMIN, MVT::v1i64, Custom);		setOperationAction(ISD::UMIN, MVT::v1i64, Custom);
setOperationAction(ISD::UMIN, MVT::v2i64, Custom);		setOperationAction(ISD::UMIN, MVT::v2i64, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, MVT::v2i64, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, MVT::v2i64, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, MVT::v2i64, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, MVT::v2i64, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, MVT::v2i64, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, MVT::v2i64, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, MVT::v2i64, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, MVT::v2i64, Custom);

		// Int operations with no NEON support.
for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,		for (auto VT : {MVT::v8i8, MVT::v16i8, MVT::v4i16, MVT::v8i16,
MVT::v2i32, MVT::v4i32, MVT::v2i64}) {		MVT::v2i32, MVT::v4i32, MVT::v2i64}) {
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);		setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
setOperationAction(ISD::VECREDUCE_OR, VT, Custom);		setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);		setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
}		}

		// FP operations with no NEON support.
		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v2f32, MVT::v4f32,
		MVT::v1f64, MVT::v2f64})
		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);

		paulwalker-armUnsubmitted Done Reply Inline Actions I doubt it's worth custom lowering `v1f64` being the expansion will be a single neon ADD. Or is this because there is current no expand code of the SEQ reductions? What about operations on vectors of `f16`. paulwalker-arm: I doubt it's worth custom lowering `v1f64` being the expansion will be a single neon ADD. Or is…
// Use SVE for vectors with more than 2 elements.		// Use SVE for vectors with more than 2 elements.
for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v4f32})		for (auto VT : {MVT::v4f16, MVT::v8f16, MVT::v4f32})
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
}		}
}		}

PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();		PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
}		}
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	void AArch64TargetLowering::addTypeForFixedLengthSVE(MVT VT) {
setOperationAction(ISD::SUB, VT, Custom);		setOperationAction(ISD::SUB, VT, Custom);
setOperationAction(ISD::TRUNCATE, VT, Custom);		setOperationAction(ISD::TRUNCATE, VT, Custom);
setOperationAction(ISD::UDIV, VT, Custom);		setOperationAction(ISD::UDIV, VT, Custom);
setOperationAction(ISD::UMAX, VT, Custom);		setOperationAction(ISD::UMAX, VT, Custom);
setOperationAction(ISD::UMIN, VT, Custom);		setOperationAction(ISD::UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_ADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_AND, VT, Custom);		setOperationAction(ISD::VECREDUCE_AND, VT, Custom);
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_OR, VT, Custom);		setOperationAction(ISD::VECREDUCE_OR, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);		setOperationAction(ISD::VECREDUCE_XOR, VT, Custom);
▲ Show 20 Lines • Show All 2,682 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::FLT_ROUNDS_:		case ISD::FLT_ROUNDS_:
return LowerFLT_ROUNDS_(Op, DAG);		return LowerFLT_ROUNDS_(Op, DAG);
case ISD::MUL:		case ISD::MUL:
return LowerMUL(Op, DAG);		return LowerMUL(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return LowerINTRINSIC_WO_CHAIN(Op, DAG);		return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::STORE:		case ISD::STORE:
return LowerSTORE(Op, DAG);		return LowerSTORE(Op, DAG);
		case ISD::VECREDUCE_SEQ_FADD:
		return LowerVECREDUCE_SEQ_FADD(Op, DAG);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
▲ Show 20 Lines • Show All 12,277 Lines • ▼ Show 20 Lines	assert(useSVEForFixedLengthVectorVT(V.getValueType()) &&
"Only fixed length vectors are supported!");		"Only fixed length vectors are supported!");
Ops.push_back(convertToScalableVector(DAG, ContainerVT, V));		Ops.push_back(convertToScalableVector(DAG, ContainerVT, V));
}		}

auto ScalableRes = DAG.getNode(Op.getOpcode(), SDLoc(Op), ContainerVT, Ops);		auto ScalableRes = DAG.getNode(Op.getOpcode(), SDLoc(Op), ContainerVT, Ops);
return convertFromScalableVector(DAG, VT, ScalableRes);		return convertFromScalableVector(DAG, VT, ScalableRes);
}		}

		SDValue AArch64TargetLowering::LowerVECREDUCE_SEQ_FADD(SDValue ScalarOp,
		paulwalker-armUnsubmitted Done Reply Inline Actions Given there's likely to be only one use of this I think calling this `LowerVECREDUCE_SEQ_FADD` is more inline with the naming of the other lowering functions. paulwalker-arm: Given there's likely to be only one use of this I think calling this `LowerVECREDUCE_SEQ_FADD`…
		cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions I moved `LowerVECREDUCE_SEQ_FADD(...)` out from under `LowerVECREDUCE(...)`. Not sure if that helps or hurts @kmclaughlin, but I can change it if there's a problem. cameron.mcinally: I moved `LowerVECREDUCE_SEQ_FADD(...)` out from under `LowerVECREDUCE(...)`. Not sure if that…
		SelectionDAG &DAG) const {
		SDLoc DL(ScalarOp);
		SDValue AccOp = ScalarOp.getOperand(0);
		SDValue VecOp = ScalarOp.getOperand(1);
		EVT SrcVT = VecOp.getValueType();
		EVT ResVT = SrcVT.getVectorElementType();

		// Only fixed length FADDA handled for now.
		if (!useSVEForFixedLengthVectorVT(SrcVT, /OverrideNEON=/true))
		return SDValue();

		SDValue Pg = getPredicateForVector(DAG, DL, SrcVT);
		EVT ContainerVT = getContainerForFixedLengthVector(DAG, SrcVT);
		SDValue Zero = DAG.getConstant(0, DL, MVT::i64);
		paulwalker-armUnsubmitted Done Reply Inline Actions Looks like we're missing some obvious patterns. I've created D89235 to add them. paulwalker-arm: Looks like we're missing some obvious patterns. I've created D89235 to add them.

		// Convert operands to Scalable.
		AccOp = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, ContainerVT,
		DAG.getUNDEF(ContainerVT), AccOp, Zero);
		VecOp = convertToScalableVector(DAG, ContainerVT, VecOp);

		// Perform reduction.
		SDValue Rdx = DAG.getNode(AArch64ISD::FADDA_PRED, DL, ContainerVT,
		Pg, AccOp, VecOp);
		paulwalker-armUnsubmitted Done Reply Inline Actions I think this can be ContainerVT? paulwalker-arm: I think this can be ContainerVT?

		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, ResVT, Rdx, Zero);
		}

		paulwalker-armUnsubmitted Not Done Reply Inline Actions You use this in a couple of places, so might be worth extracting into `Zero =` ? paulwalker-arm: You use this in a couple of places, so might be worth extracting into `Zero = ` ?
SDValue AArch64TargetLowering::LowerFixedLengthReductionToSVE(unsigned Opcode,		SDValue AArch64TargetLowering::LowerFixedLengthReductionToSVE(unsigned Opcode,
SDValue ScalarOp, SelectionDAG &DAG) const {		SDValue ScalarOp, SelectionDAG &DAG) const {
SDLoc DL(ScalarOp);		SDLoc DL(ScalarOp);
SDValue VecOp = ScalarOp.getOperand(0);		SDValue VecOp = ScalarOp.getOperand(0);
EVT SrcVT = VecOp.getValueType();		EVT SrcVT = VecOp.getValueType();

SDValue Pg = getPredicateForVector(DAG, DL, SrcVT);		SDValue Pg = getPredicateForVector(DAG, DL, SrcVT);
EVT ContainerVT = getContainerForFixedLengthVector(DAG, SrcVT);		EVT ContainerVT = getContainerForFixedLengthVector(DAG, SrcVT);
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	int getInterleavedMemoryOpCost(
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);

bool		bool
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader);		bool &AllowPromotionWithoutCommonHeader);

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::vector_reduce_fadd:		case Intrinsic::vector_reduce_fadd: {
		Value *VecOp = II->getArgOperand(1);
		EVT VT = TLI->getValueType(getDataLayout(), VecOp->getType());
		paulwalker-armUnsubmitted Done Reply Inline Actions This'll need to be more verbose because having SVE is only safe for scalable vectors. You'll need to look at the useSVEForFixedLengthVectorVT question if II is operating on fixed length types. paulwalker-arm: This'll need to be more verbose because having SVE is only safe for scalable vectors. You'll…
		if (ST->hasSVE() &&
		TLI->useSVEForFixedLengthVectorVT(VT, /OverrideNEON=/true))
		return false;

		return !II->getFastMathFlags().allowReassoc();
		}
case Intrinsic::vector_reduce_fmul:		case Intrinsic::vector_reduce_fmul:
// We don't have legalization support for ordered FP reductions.		// We don't have legalization support for ordered FP reductions.
return !II->getFastMathFlags().allowReassoc();		return !II->getFastMathFlags().allowReassoc();

default:		default:
// Don't expand anything else, let legalization deal with it.		// Don't expand anything else, let legalization deal with it.
return false;		return false;
}		}
Show All 21 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll

	Show All 15 Lines
	; RUN: llc -aarch64-sve-vector-bits-min=2048 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048			; RUN: llc -aarch64-sve-vector-bits-min=2048 -asm-verbose=0 < %s \| FileCheck %s -D#VBYTES=256 -check-prefixes=CHECK,VBITS_GE_512,VBITS_GE_1024,VBITS_GE_2048

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; Don't use SVE when its registers are no bigger than NEON.			; Don't use SVE when its registers are no bigger than NEON.
	; NO_SVE-NOT: ptrue			; NO_SVE-NOT: ptrue

	;			;
				; FADDA
				;

				; No single instruction NEON support. Use SVE.
				define half @fadda_v4f16(half %start, <4 x half> %a) #0 {
				; CHECK-LABEL: fadda_v4f16:
				; CHECK: ptrue [[PG:p[0-9]+]].h, vl4
				; CHECK-NEXT: fadda h0, [[PG]], h0, z1.h
				; CHECK-NEXT: ret
				%res = call half @llvm.vector.reduce.fadd.v4f16(half %start, <4 x half> %a)
				ret half %res
				}

				; No single instruction NEON support. Use SVE.
				define half @fadda_v8f16(half %start, <8 x half> %a) #0 {
				; CHECK-LABEL: fadda_v8f16:
				; CHECK: ptrue [[PG:p[0-9]+]].h, vl8
				; CHECK-NEXT: fadda h0, [[PG]], h0, z1.h
				; CHECK-NEXT: ret
				%res = call half @llvm.vector.reduce.fadd.v8f16(half %start, <8 x half> %a)
				ret half %res
				}

				define half @fadda_v16f16(half %start, <16 x half>* %a) #0 {
				; CHECK-LABEL: fadda_v16f16:
				; CHECK: ptrue [[PG:p[0-9]+]].h, vl16
				; CHECK-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; CHECK-NEXT: fadda h0, [[PG]], h0, [[OP]].h
				; CHECK-NEXT: ret
				%op = load <16 x half>, <16 x half>* %a
				%res = call half @llvm.vector.reduce.fadd.v16f16(half %start, <16 x half> %op)
				ret half %res
				}

				paulwalker-armUnsubmitted Done Reply Inline Actions Looks like a stray X here. paulwalker-arm: Looks like a stray X here.
				define half @fadda_v32f16(half %start, <32 x half>* %a) #0 {
				; CHECK-LABEL: fadda_v32f16:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl32
				; VBITS_GE_512-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fadda h0, [[PG]], h0, [[OP]].h
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation.
				paulwalker-armUnsubmitted Not Done Reply Inline Actions What does this mean? I was going to say you're missing VBITS_EQ_256 lines here but then noticed the output. What's you're feeling on this? I see two routes: Stick with the expanded output for this patch, in which case just having `VBITS_EQ_256-COUNT-33: fadd` is good enough. For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard. `VECREDUCE_SEQ_FADD A, X_512 => VECREDUCE_SEQ_FADD (VECREDUCE_SEQ_FADD X, X_LO), X_HI` ? I'm happy with either approach. paulwalker-arm: What does this mean? I was going to say you're missing VBITS_EQ_256 lines here but then…
				cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD operations, which I don't think should be that hard. Oh, you're right. I initially thought we couldn't preserve the ordering constraint if we split, but we could use the accumulator to chain them. Makes sense. I'll look into that... cameron.mcinally: > For better code generation we're just missing the ability to split VECREDUCE_SEQ_FADD…
				; VBITS_EQ_256-COUNT-32: fadd
				; VBITS_EQ_256: ret
				%op = load <32 x half>, <32 x half>* %a
				%res = call half @llvm.vector.reduce.fadd.v32f16(half %start, <32 x half> %op)
				ret half %res
				}

				define half @fadda_v64f16(half %start, <64 x half>* %a) #0 {
				; CHECK-LABEL: fadda_v64f16:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].h, vl64
				; VBITS_GE_1024-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_1024-NEXT: fadda h0, [[PG]], h0, [[OP]].h
				; VBITS_GE_1024-NEXT: ret
				%op = load <64 x half>, <64 x half>* %a
				%res = call half @llvm.vector.reduce.fadd.v64f16(half %start, <64 x half> %op)
				ret half %res
				}

				define half @fadda_v128f16(half %start, <128 x half>* %a) #0 {
				; CHECK-LABEL: fadda_v128f16:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].h, vl128
				; VBITS_GE_2048-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fadda h0, [[PG]], h0, [[OP]].h
				; VBITS_GE_2048-NEXT: ret
				%op = load <128 x half>, <128 x half>* %a
				%res = call half @llvm.vector.reduce.fadd.v128f16(half %start, <128 x half> %op)
				ret half %res
				}

				; No single instruction NEON support. Use SVE.
				define float @fadda_v2f32(float %start, <2 x float> %a) #0 {
				; CHECK-LABEL: fadda_v2f32:
				; CHECK: ptrue [[PG:p[0-9]+]].s, vl2
				; CHECK-NEXT: fadda s0, [[PG]], s0, z1.s
				; CHECK-NEXT: ret
				%res = call float @llvm.vector.reduce.fadd.v2f32(float %start, <2 x float> %a)
				ret float %res
				}

				; No single instruction NEON support. Use SVE.
				define float @fadda_v4f32(float %start, <4 x float> %a) #0 {
				; CHECK-LABEL: fadda_v4f32:
				; CHECK: ptrue [[PG:p[0-9]+]].s, vl4
				; CHECK-NEXT: fadda s0, [[PG]], s0, z1.s
				; CHECK-NEXT: ret
				%res = call float @llvm.vector.reduce.fadd.v4f32(float %start, <4 x float> %a)
				ret float %res
				}

				define float @fadda_v8f32(float %start, <8 x float>* %a) #0 {
				; CHECK-LABEL: fadda_v8f32:
				; CHECK: ptrue [[PG:p[0-9]+]].s, vl8
				; CHECK-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; CHECK-NEXT: fadda s0, [[PG]], s0, [[OP]].s
				; CHECK-NEXT: ret
				%op = load <8 x float>, <8 x float>* %a
				%res = call float @llvm.vector.reduce.fadd.v8f32(float %start, <8 x float> %op)
				ret float %res
				}

				define float @fadda_v16f32(float %start, <16 x float>* %a) #0 {
				; CHECK-LABEL: fadda_v16f32:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16
				; VBITS_GE_512-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fadda s0, [[PG]], s0, [[OP]].s
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation.
				; VBITS_EQ_256-COUNT-16: fadd
				; VBITS_EQ_256: ret
				%op = load <16 x float>, <16 x float>* %a
				%res = call float @llvm.vector.reduce.fadd.v16f32(float %start, <16 x float> %op)
				ret float %res
				}

				define float @fadda_v32f32(float %start, <32 x float>* %a) #0 {
				; CHECK-LABEL: fadda_v32f32:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32
				; VBITS_GE_1024-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_1024-NEXT: fadda s0, [[PG]], s0, [[OP]].s
				; VBITS_GE_1024-NEXT: ret
				%op = load <32 x float>, <32 x float>* %a
				%res = call float @llvm.vector.reduce.fadd.v32f32(float %start, <32 x float> %op)
				ret float %res
				}

				define float @fadda_v64f32(float %start, <64 x float>* %a) #0 {
				; CHECK-LABEL: fadda_v64f32:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].s, vl64
				; VBITS_GE_2048-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fadda s0, [[PG]], s0, [[OP]].s
				; VBITS_GE_2048-NEXT: ret
				%op = load <64 x float>, <64 x float>* %a
				%res = call float @llvm.vector.reduce.fadd.v64f32(float %start, <64 x float> %op)
				ret float %res
				}

				; No single instruction NEON support. Use SVE.
				define double @fadda_v1f64(double %start, <1 x double> %a) #0 {
				; CHECK-LABEL: fadda_v1f64:
				; CHECK: ptrue [[PG:p[0-9]+]].d, vl1
				; CHECK-NEXT: fadda d0, [[PG]], d0, z1.d
				; CHECK-NEXT: ret
				%res = call double @llvm.vector.reduce.fadd.v1f64(double %start, <1 x double> %a)
				ret double %res
				}

				; No single instruction NEON support. Use SVE.
				define double @fadda_v2f64(double %start, <2 x double> %a) #0 {
				; CHECK-LABEL: fadda_v2f64:
				; CHECK: ptrue [[PG:p[0-9]+]].d, vl2
				; CHECK-NEXT: fadda d0, [[PG]], d0, z1.d
				; CHECK-NEXT: ret
				%res = call double @llvm.vector.reduce.fadd.v2f64(double %start, <2 x double> %a)
				ret double %res
				}

				define double @fadda_v4f64(double %start, <4 x double>* %a) #0 {
				; CHECK-LABEL: fadda_v4f64:
				; CHECK: ptrue [[PG:p[0-9]+]].d, vl4
				; CHECK-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; CHECK-NEXT: fadda d0, [[PG]], d0, [[OP]].d
				; CHECK-NEXT: ret
				%op = load <4 x double>, <4 x double>* %a
				%res = call double @llvm.vector.reduce.fadd.v4f64(double %start, <4 x double> %op)
				ret double %res
				}

				define double @fadda_v8f64(double %start, <8 x double>* %a) #0 {
				; CHECK-LABEL: fadda_v8f64:
				; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8
				; VBITS_GE_512-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_512-NEXT: fadda d0, [[PG]], d0, [[OP]].d
				; VBITS_GE_512-NEXT: ret

				; Ensure sensible type legalisation.
				; VBITS_EQ_256-COUNT-8: fadd
				; VBITS_EQ_256: ret
				%op = load <8 x double>, <8 x double>* %a
				%res = call double @llvm.vector.reduce.fadd.v8f64(double %start, <8 x double> %op)
				ret double %res
				}

				define double @fadda_v16f64(double %start, <16 x double>* %a) #0 {
				; CHECK-LABEL: fadda_v16f64:
				; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16
				; VBITS_GE_1024-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_1024-NEXT: fadda d0, [[PG]], d0, [[OP]].d
				; VBITS_GE_1024-NEXT: ret
				%op = load <16 x double>, <16 x double>* %a
				%res = call double @llvm.vector.reduce.fadd.v16f64(double %start, <16 x double> %op)
				ret double %res
				}

				define double @fadda_v32f64(double %start, <32 x double>* %a) #0 {
				; CHECK-LABEL: fadda_v32f64:
				; VBITS_GE_2048: ptrue [[PG:p[0-9]+]].d, vl32
				; VBITS_GE_2048-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_GE_2048-NEXT: fadda d0, [[PG]], d0, [[OP]].d
				; VBITS_GE_2048-NEXT: ret
				%op = load <32 x double>, <32 x double>* %a
				%res = call double @llvm.vector.reduce.fadd.v32f64(double %start, <32 x double> %op)
				ret double %res
				}

				;
	; FADDV			; FADDV
	;			;

	; No single instruction NEON support for 4 element vectors.			; No single instruction NEON support for 4 element vectors.
	define half @faddv_v4f16(half %start, <4 x half> %a) #0 {			define half @faddv_v4f16(half %start, <4 x half> %a) #0 {
	; CHECK-LABEL: faddv_v4f16:			; CHECK-LABEL: faddv_v4f16:
	; CHECK: ptrue [[PG:p[0-9]+]].h, vl4			; CHECK: ptrue [[PG:p[0-9]+]].h, vl4
	; CHECK-NEXT: faddv [[RDX:h[0-9]+]], [[PG]], z1.h			; CHECK-NEXT: faddv [[RDX:h[0-9]+]], [[PG]], z1.h
	▲ Show 20 Lines • Show All 729 Lines • Show Last 20 Lines