This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/
-
SelectionDAG/
-
LegalizeDAG.cpp
-
LegalizeFloatTypes.cpp
2/2
LegalizeTypes.h
-
LegalizeVectorOps.cpp
3/4
LegalizeVectorTypes.cpp
-
SelectionDAG.cpp
-
TargetLowering.cpp
-
TargetLoweringBase.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64TargetTransformInfo.h
-
ARM/
-
ARMTargetTransformInfo.h
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fixed-length-fp-reduce.ll
-
vecreduce-fadd-legalization-strict.ll

Differential D90247

[AArch64] Add legalizations for VECREDUCE_SEQ_FADD
ClosedPublic

Authored by cameron.mcinally on Oct 27 2020, 9:39 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
kmclaughlin
nikic
efriedma
sdesmalen
rengolin
spatel

Commits

rGdda1e74b58bd: [Legalize] Add legalizations for VECREDUCE_SEQ_FADD

Summary

Continuing from D89162...

I think I stumbled across the motivation for the ExpandReductions pass. If we wait until Legalization to expand the ordered reductions, we end up with suboptimal code for illegal types. And it's pretty bad. We could end up with O(n/2) extra operations.

A good example of this can be seen in @test_v3f32 from vecreduce-fadd-legalization-strict.ll. Here we end up with 4 FADDs, instead of the 3 FADDs required. The newly added FADD is the result of widening the illegal v3f32 vector type to v4f32, where the newly added element in the reduction is the "neutral" value, 0.0.

Question is... should we be optimizing for exceptional cases? Eh, probably not IMHO. But I could see users filing bug reports for the extra operations. What does everyone else think?

[*Also note that I decided to turn these new legalizations on for NEON, and not just SVE. NEON had a number of existing tests, so it made sense to reuse them. And I'm still missing legalization support for some other actions, like SoftenFloat and PromoteFloat. I did not find existing tests for those actions.*]

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cameron.mcinally created this revision.Oct 27 2020, 9:39 AM

Herald added a reviewer: rengolin. · View Herald TranscriptOct 27 2020, 9:39 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

cameron.mcinally requested review of this revision.Oct 27 2020, 9:39 AM

A good example of this can be seen in @test_v3f32 from vecreduce-fadd-legalization-strict.ll. Here we end up with 4 FADDs, instead of the 3 FADDs required. The newly added FADD is the result of widening the illegal v3f32 vector type to v4f32, where the newly added element in the reduction is the "neutral" value, 0.0.

Looking at https://github.com/llvm/llvm-project/blob/5a3ef55a524bf9e072d98286e5febdb218b1fc72/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7477-L7480, shouldn't this just be a matter of using -0.0 as the neutral element instead? If 0.0 is not actually neutral here, then this is not just suboptimal, it's incorrect. (We should fix this for the non-sequential case as well.)

And I'm still missing legalization support for some other actions, like SoftenFloat and PromoteFloat. I did not find existing tests for those actions.

Copying llvm/test/CodeGen/ARM/vecreduce-fadd-legalization-soft-float.ll without the fast fmf should work as test coverage for that.

cameron.mcinally added a reviewer: spatel.Oct 27 2020, 10:39 AM

In D90247#2357013, @nikic wrote:

A good example of this can be seen in @test_v3f32 from vecreduce-fadd-legalization-strict.ll. Here we end up with 4 FADDs, instead of the 3 FADDs required. The newly added FADD is the result of widening the illegal v3f32 vector type to v4f32, where the newly added element in the reduction is the "neutral" value, 0.0.

Looking at https://github.com/llvm/llvm-project/blob/5a3ef55a524bf9e072d98286e5febdb218b1fc72/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7477-L7480, shouldn't this just be a matter of using -0.0 as the neutral element instead? If 0.0 is not actually neutral here, then this is not just suboptimal, it's incorrect. (We should fix this for the non-sequential case as well.)

@spatel

Yup, good catch. The (-0.0+0.0 -> +0.0) case is a problem. The identity should be -0.0.

And I'm still missing legalization support for some other actions, like SoftenFloat and PromoteFloat. I did not find existing tests for those actions.

Copying llvm/test/CodeGen/ARM/vecreduce-fadd-legalization-soft-float.ll without the fast fmf should work as test coverage for that.

Ok, I can build that out. Are we okay with the suboptimal legalization though? I'll wait for that decision before putting more time into this.

Or does anyone see a clever fix for the illegal type legalization? It looks like we lost information during widening, so I'm not sure we can get it back in a non-hacky way.

Ok, I can build that out. Are we okay with the suboptimal legalization though? I'll wait for that decision before putting more time into this.

Or does anyone see a clever fix for the illegal type legalization? It looks like we lost information during widening, so I'm not sure we can get it back in a non-hacky way.

Not sure I follow. If the neutral element is fixed, then the extra fadds should also get folded away. Or is there some additional sub-optimality here?

In D90247#2357093, @nikic wrote:

Ok, I can build that out. Are we okay with the suboptimal legalization though? I'll wait for that decision before putting more time into this.

Or does anyone see a clever fix for the illegal type legalization? It looks like we lost information during widening, so I'm not sure we can get it back in a non-hacky way.

Not sure I follow. If the neutral element is fixed, then the extra fadds should also get folded away. Or is there some additional sub-optimality here?

Ah, I follow now. Didn't catch the connection...

Harbormaster completed remote builds in B76593: Diff 301027.Oct 27 2020, 11:38 AM

Update 'neutral' element to -0.0.

@nikic, I don't see llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization-soft-float.ll. Is that a test file you have downstream? Or is it just missing for AArch64/?

Ah, I see it in ARM/. That will work...

In D90247#2357013, @nikic wrote:

Looking at https://github.com/llvm/llvm-project/blob/5a3ef55a524bf9e072d98286e5febdb218b1fc72/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp#L7477-L7480, shouldn't this just be a matter of using -0.0 as the neutral element instead? If 0.0 is not actually neutral here, then this is not just suboptimal, it's incorrect. (We should fix this for the non-sequential case as well.)

Agree - looks like the -0.0 was missed in D58015. Note that we have semi-redundant logic for this sort of thing in IR with ConstantExpr::getBinOpIdentity(), so we should confirm that everything is lined up and/or try to consolidate code.

Comment from ARM/ARMISelLowering.cpp:

// Custom Expand smaller than legal vector reductions to prevent false zero
// items being added.
setOperationAction(ISD::VECREDUCE_FADD, MVT::v4f16, Custom);

So it will probably take some work to truly unwind the 0.0 neutral element code. Just FYI...

In D90247#2357606, @cameron.mcinally wrote:
Comment from ARM/ARMISelLowering.cpp:
// Custom Expand smaller than legal vector reductions to prevent false zero
// items being added.
setOperationAction(ISD::VECREDUCE_FADD, MVT::v4f16, Custom);

Those are for MVE. Maybe try NEON instead?

Updated patch with, I think, all the needed legalizations.

This patch touches both the ARM and AArch64 backends, to exercise all the different legalizations. If this is too much to review in one patch, we might want to split it up into two Diffs, even if there's a small window of unsupported legalizations in the wild. It should be relatively safe to split.

Also note that my confidence in the softening test changes is low. E.g. @test_v4f32_strict. Is it obvious to anyone if the new instruction sequences are bad?

nikic added inline comments.Oct 29 2020, 2:10 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
713	Looks like there is no definition for this method. There's probably no way to test it (requires X86), but I'd suggest to still include it, as the implementation is the same as the other soft float legalization. I'd also be fine with not having it, in which case this declaration should be dropped as well.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
826	As this is a recurring pattern, I have extracted an `ISD::getVecReduceBaseOpcode()` function that does this. Please add VECREDUCE_SEQ_FADD there and then use it here and in expansion.
4865	Again, this is a recurring pattern, so I have extracted a `SelectionDAG::getNeutralElement()` API for this. The invocation should be: SDValue NeutralElem = DAG.getNeutralElement( ISD::getVecReduceBaseOpcode(N->getOpcode()), dl, ElemVT, Flags);

Update patch based on @nikic's comments...

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h
713	Implemented.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4865	I improvised a little here for readability. Is that okay with you, @nikic? Although, I wonder if getting the BaseOpc is a necessary step here. Did you consider just adding VECREDUCE_* cases to `DAG.getNeutralElement(...)`?

LGTM with the default action adjusted...

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4865	I improvised a little here for readability. Is that okay with you, @nikic? Sure! Although, I wonder if getting the BaseOpc is a necessary step here. Did you consider just adding VECREDUCE_* cases to `DAG.getNeutralElement(...)`? Not strictly necessary, but I personally found it a bit more elegant to not repeat the full VECREDUCE list another time here.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
929 ↗	(On Diff #301993)	Rather than explicitly marking them as expanded in targets, VECREDUCE_SEQ_FADD should be marked as default expanded using the list at https://github.com/llvm/llvm-project/blob/aa1c6b79878475d61c90c0d4c2af17312242f18e/llvm/lib/CodeGen/TargetLoweringBase.cpp#L722, to stay consistent with other VECREDUCE opcodes.

This revision is now accepted and ready to land.Oct 30 2020, 1:06 PM

cameron.mcinally added inline comments.Oct 30 2020, 2:02 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
929 ↗	(On Diff #301993)	Oh, cool. Didn't know that list existed. Thanks.

Closed by commit rGdda1e74b58bd: [Legalize] Add legalizations for VECREDUCE_SEQ_FADD (authored by cameron.mcinally). · Explain WhyOct 30 2020, 2:03 PM

This revision was automatically updated to reflect the committed changes.

cameron.mcinally added a commit: rGdda1e74b58bd: [Legalize] Add legalizations for VECREDUCE_SEQ_FADD.

cameron.mcinally mentioned this in D90644: [Legalizer][ARM][AArch64] Add legalizations for VECREDUCE_SEQ_FMUL.Nov 2 2020, 1:45 PM

cameron.mcinally mentioned this in rGc126eb7529be: [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL.Nov 4 2020, 12:20 PM

spatel mentioned this in D96552: [Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics.Feb 11 2021, 2:36 PM

spatel mentioned this in rG79b1b4a58151: [Vectorizers][TTI] remove option to bypass creation of vector reduction….Feb 12 2021, 5:34 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

3 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

4 lines

LegalizeFloatTypes.cpp

24 lines

LegalizeTypes.h

6 lines

LegalizeVectorOps.cpp

11 lines

LegalizeVectorTypes.cpp

74 lines

SelectionDAG.cpp

1 line

TargetLowering.cpp

22 lines

TargetLoweringBase.cpp

1 line

Target/

AArch64/

AArch64ISelLowering.h

14 lines

AArch64TargetTransformInfo.h

11 lines

ARM/

ARMTargetTransformInfo.h

3 lines

test/

CodeGen/

AArch64/

sve-fixed-length-fp-reduce.ll

27 lines

vecreduce-fadd-legalization-strict.ll

94 lines

Diff 302011

llvm/include/llvm/CodeGen/TargetLowering.h

	Show First 20 Lines • Show All 4,441 Lines • ▼ Show 20 Lines
	/// expansion was successful and populates the Result and Overflow arguments.			/// expansion was successful and populates the Result and Overflow arguments.
	bool expandMULO(SDNode *Node, SDValue &Result, SDValue &Overflow,			bool expandMULO(SDNode *Node, SDValue &Result, SDValue &Overflow,
	SelectionDAG &DAG) const;			SelectionDAG &DAG) const;

	/// Expand a VECREDUCE_* into an explicit calculation. If Count is specified,			/// Expand a VECREDUCE_* into an explicit calculation. If Count is specified,
	/// only the first Count elements of the vector are used.			/// only the first Count elements of the vector are used.
	SDValue expandVecReduce(SDNode *Node, SelectionDAG &DAG) const;			SDValue expandVecReduce(SDNode *Node, SelectionDAG &DAG) const;

				/// Expand a VECREDUCE_SEQ_* into an explicit ordered calculation.
				SDValue expandVecReduceSeq(SDNode *Node, SelectionDAG &DAG) const;

	/// Expand an SREM or UREM using SDIV/UDIV or SDIVREM/UDIVREM, if legal.			/// Expand an SREM or UREM using SDIV/UDIV or SDIVREM/UDIVREM, if legal.
	/// Returns true if the expansion was successful.			/// Returns true if the expansion was successful.
	bool expandREM(SDNode *Node, SDValue &Result, SelectionDAG &DAG) const;			bool expandREM(SDNode *Node, SDValue &Result, SelectionDAG &DAG) const;

	//===--------------------------------------------------------------------===//			//===--------------------------------------------------------------------===//
	// Instruction Emitting Hooks			// Instruction Emitting Hooks
	//			//

	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 1,159 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Action = TLI.getOperationAction(		Action = TLI.getOperationAction(
Node->getOpcode(), Node->getOperand(0).getValueType());		Node->getOpcode(), Node->getOperand(0).getValueType());
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		Action = TLI.getOperationAction(
		Node->getOpcode(), Node->getOperand(1).getValueType());
		break;
default:		default:
if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {		if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {
Action = TargetLowering::Legal;		Action = TargetLowering::Legal;
} else {		} else {
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
}		}
break;		break;
}		}
▲ Show 20 Lines • Show All 3,785 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	#endif
case ISD::UNDEF: R = SoftenFloatRes_UNDEF(N); break;		case ISD::UNDEF: R = SoftenFloatRes_UNDEF(N); break;
case ISD::VAARG: R = SoftenFloatRes_VAARG(N); break;		case ISD::VAARG: R = SoftenFloatRes_VAARG(N); break;
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
R = SoftenFloatRes_VECREDUCE(N);		R = SoftenFloatRes_VECREDUCE(N);
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		R = SoftenFloatRes_VECREDUCE_SEQ(N);
		break;
}		}

// If R is null, the sub-method took care of registering the result.		// If R is null, the sub-method took care of registering the result.
if (R.getNode()) {		if (R.getNode()) {
assert(R.getNode() != N);		assert(R.getNode() != N);
SetSoftenedFloat(SDValue(N, ResNo), R);		SetSoftenedFloat(SDValue(N, ResNo), R);
}		}
}		}
▲ Show 20 Lines • Show All 628 Lines • ▼ Show 20 Lines
}		}

SDValue DAGTypeLegalizer::SoftenFloatRes_VECREDUCE(SDNode *N) {		SDValue DAGTypeLegalizer::SoftenFloatRes_VECREDUCE(SDNode *N) {
// Expand and soften recursively.		// Expand and soften recursively.
ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduce(N, DAG));		ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduce(N, DAG));
return SDValue();		return SDValue();
}		}

		SDValue DAGTypeLegalizer::SoftenFloatRes_VECREDUCE_SEQ(SDNode *N) {
		ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduceSeq(N, DAG));
		return SDValue();
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Convert Float Operand to Integer		// Convert Float Operand to Integer
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool DAGTypeLegalizer::SoftenFloatOperand(SDNode *N, unsigned OpNo) {		bool DAGTypeLegalizer::SoftenFloatOperand(SDNode *N, unsigned OpNo) {
LLVM_DEBUG(dbgs() << "Soften float operand " << OpNo << ": "; N->dump(&DAG);		LLVM_DEBUG(dbgs() << "Soften float operand " << OpNo << ": "; N->dump(&DAG);
dbgs() << "\n");		dbgs() << "\n");
▲ Show 20 Lines • Show All 1,454 Lines • ▼ Show 20 Lines	#endif
case ISD::UNDEF: R = PromoteFloatRes_UNDEF(N); break;		case ISD::UNDEF: R = PromoteFloatRes_UNDEF(N); break;
case ISD::ATOMIC_SWAP: R = BitcastToInt_ATOMIC_SWAP(N); break;		case ISD::ATOMIC_SWAP: R = BitcastToInt_ATOMIC_SWAP(N); break;
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
R = PromoteFloatRes_VECREDUCE(N);		R = PromoteFloatRes_VECREDUCE(N);
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		R = PromoteFloatRes_VECREDUCE_SEQ(N);
		break;
}		}

if (R.getNode())		if (R.getNode())
SetPromotedFloat(SDValue(N, ResNo), R);		SetPromotedFloat(SDValue(N, ResNo), R);
}		}

// Bitcast from i16 to f16: convert the i16 to a f32 value instead.		// Bitcast from i16 to f16: convert the i16 to a f32 value instead.
// At this point, it is not possible to determine if the bitcast value is		// At this point, it is not possible to determine if the bitcast value is
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteFloatRes_VECREDUCE(SDNode *N) {
// Expand and promote recursively.		// Expand and promote recursively.
// TODO: This is non-optimal, but dealing with the concurrently happening		// TODO: This is non-optimal, but dealing with the concurrently happening
// vector-legalization is non-trivial. We could do something similar to		// vector-legalization is non-trivial. We could do something similar to
// PromoteFloatRes_EXTRACT_VECTOR_ELT here.		// PromoteFloatRes_EXTRACT_VECTOR_ELT here.
ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduce(N, DAG));		ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduce(N, DAG));
return SDValue();		return SDValue();
}		}

		SDValue DAGTypeLegalizer::PromoteFloatRes_VECREDUCE_SEQ(SDNode *N) {
		ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduceSeq(N, DAG));
		return SDValue();
		}

SDValue DAGTypeLegalizer::BitcastToInt_ATOMIC_SWAP(SDNode *N) {		SDValue DAGTypeLegalizer::BitcastToInt_ATOMIC_SWAP(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

AtomicSDNode *AM = cast<AtomicSDNode>(N);		AtomicSDNode *AM = cast<AtomicSDNode>(N);
SDLoc SL(N);		SDLoc SL(N);

SDValue CastVal = BitConvertToInteger(AM->getVal());		SDValue CastVal = BitConvertToInteger(AM->getVal());
EVT CastVT = CastVal.getValueType();		EVT CastVT = CastVal.getValueType();
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	#endif
case ISD::UNDEF: R = SoftPromoteHalfRes_UNDEF(N); break;		case ISD::UNDEF: R = SoftPromoteHalfRes_UNDEF(N); break;
case ISD::ATOMIC_SWAP: R = BitcastToInt_ATOMIC_SWAP(N); break;		case ISD::ATOMIC_SWAP: R = BitcastToInt_ATOMIC_SWAP(N); break;
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
R = SoftPromoteHalfRes_VECREDUCE(N);		R = SoftPromoteHalfRes_VECREDUCE(N);
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		R = SoftPromoteHalfRes_VECREDUCE_SEQ(N);
		break;
}		}

if (R.getNode())		if (R.getNode())
SetSoftPromotedHalf(SDValue(N, ResNo), R);		SetSoftPromotedHalf(SDValue(N, ResNo), R);
}		}

SDValue DAGTypeLegalizer::SoftPromoteHalfRes_BITCAST(SDNode *N) {		SDValue DAGTypeLegalizer::SoftPromoteHalfRes_BITCAST(SDNode *N) {
return BitConvertToInteger(N->getOperand(0));		return BitConvertToInteger(N->getOperand(0));
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines
}		}

SDValue DAGTypeLegalizer::SoftPromoteHalfRes_VECREDUCE(SDNode *N) {		SDValue DAGTypeLegalizer::SoftPromoteHalfRes_VECREDUCE(SDNode *N) {
// Expand and soften recursively.		// Expand and soften recursively.
ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduce(N, DAG));		ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduce(N, DAG));
return SDValue();		return SDValue();
}		}

		SDValue DAGTypeLegalizer::SoftPromoteHalfRes_VECREDUCE_SEQ(SDNode *N) {
		// Expand and soften.
		ReplaceValueWith(SDValue(N, 0), TLI.expandVecReduceSeq(N, DAG));
		return SDValue();
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Half Operand Soft Promotion		// Half Operand Soft Promotion
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool DAGTypeLegalizer::SoftPromoteHalfOperand(SDNode *N, unsigned OpNo) {		bool DAGTypeLegalizer::SoftPromoteHalfOperand(SDNode *N, unsigned OpNo) {
LLVM_DEBUG(dbgs() << "Soft promote half operand " << OpNo << ": ";		LLVM_DEBUG(dbgs() << "Soft promote half operand " << OpNo << ": ";
N->dump(&DAG); dbgs() << "\n");		N->dump(&DAG); dbgs() << "\n");
SDValue Res = SDValue();		SDValue Res = SDValue();
▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 545 Lines • ▼ Show 20 Lines	private:
SDValue SoftenFloatRes_FTRUNC(SDNode *N);		SDValue SoftenFloatRes_FTRUNC(SDNode *N);
SDValue SoftenFloatRes_LOAD(SDNode *N);		SDValue SoftenFloatRes_LOAD(SDNode *N);
SDValue SoftenFloatRes_SELECT(SDNode *N);		SDValue SoftenFloatRes_SELECT(SDNode *N);
SDValue SoftenFloatRes_SELECT_CC(SDNode *N);		SDValue SoftenFloatRes_SELECT_CC(SDNode *N);
SDValue SoftenFloatRes_UNDEF(SDNode *N);		SDValue SoftenFloatRes_UNDEF(SDNode *N);
SDValue SoftenFloatRes_VAARG(SDNode *N);		SDValue SoftenFloatRes_VAARG(SDNode *N);
SDValue SoftenFloatRes_XINT_TO_FP(SDNode *N);		SDValue SoftenFloatRes_XINT_TO_FP(SDNode *N);
SDValue SoftenFloatRes_VECREDUCE(SDNode *N);		SDValue SoftenFloatRes_VECREDUCE(SDNode *N);
		SDValue SoftenFloatRes_VECREDUCE_SEQ(SDNode *N);

// Convert Float Operand to Integer.		// Convert Float Operand to Integer.
bool SoftenFloatOperand(SDNode *N, unsigned OpNo);		bool SoftenFloatOperand(SDNode *N, unsigned OpNo);
SDValue SoftenFloatOp_Unary(SDNode *N, RTLIB::Libcall LC);		SDValue SoftenFloatOp_Unary(SDNode *N, RTLIB::Libcall LC);
SDValue SoftenFloatOp_BITCAST(SDNode *N);		SDValue SoftenFloatOp_BITCAST(SDNode *N);
SDValue SoftenFloatOp_BR_CC(SDNode *N);		SDValue SoftenFloatOp_BR_CC(SDNode *N);
SDValue SoftenFloatOp_FP_ROUND(SDNode *N);		SDValue SoftenFloatOp_FP_ROUND(SDNode *N);
SDValue SoftenFloatOp_FP_TO_XINT(SDNode *N);		SDValue SoftenFloatOp_FP_TO_XINT(SDNode *N);
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	private:
SDValue PromoteFloatRes_LOAD(SDNode *N);		SDValue PromoteFloatRes_LOAD(SDNode *N);
SDValue PromoteFloatRes_SELECT(SDNode *N);		SDValue PromoteFloatRes_SELECT(SDNode *N);
SDValue PromoteFloatRes_SELECT_CC(SDNode *N);		SDValue PromoteFloatRes_SELECT_CC(SDNode *N);
SDValue PromoteFloatRes_UnaryOp(SDNode *N);		SDValue PromoteFloatRes_UnaryOp(SDNode *N);
SDValue PromoteFloatRes_UNDEF(SDNode *N);		SDValue PromoteFloatRes_UNDEF(SDNode *N);
SDValue BitcastToInt_ATOMIC_SWAP(SDNode *N);		SDValue BitcastToInt_ATOMIC_SWAP(SDNode *N);
SDValue PromoteFloatRes_XINT_TO_FP(SDNode *N);		SDValue PromoteFloatRes_XINT_TO_FP(SDNode *N);
SDValue PromoteFloatRes_VECREDUCE(SDNode *N);		SDValue PromoteFloatRes_VECREDUCE(SDNode *N);
		SDValue PromoteFloatRes_VECREDUCE_SEQ(SDNode *N);

bool PromoteFloatOperand(SDNode *N, unsigned OpNo);		bool PromoteFloatOperand(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_BITCAST(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_FCOPYSIGN(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_FCOPYSIGN(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_FP_EXTEND(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_FP_EXTEND(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_FP_TO_XINT(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_FP_TO_XINT(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_STORE(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_STORE(SDNode *N, unsigned OpNo);
SDValue PromoteFloatOp_SELECT_CC(SDNode *N, unsigned OpNo);		SDValue PromoteFloatOp_SELECT_CC(SDNode *N, unsigned OpNo);
Show All 22 Lines	private:
SDValue SoftPromoteHalfRes_FP_ROUND(SDNode *N);		SDValue SoftPromoteHalfRes_FP_ROUND(SDNode *N);
SDValue SoftPromoteHalfRes_LOAD(SDNode *N);		SDValue SoftPromoteHalfRes_LOAD(SDNode *N);
SDValue SoftPromoteHalfRes_SELECT(SDNode *N);		SDValue SoftPromoteHalfRes_SELECT(SDNode *N);
SDValue SoftPromoteHalfRes_SELECT_CC(SDNode *N);		SDValue SoftPromoteHalfRes_SELECT_CC(SDNode *N);
SDValue SoftPromoteHalfRes_UnaryOp(SDNode *N);		SDValue SoftPromoteHalfRes_UnaryOp(SDNode *N);
SDValue SoftPromoteHalfRes_XINT_TO_FP(SDNode *N);		SDValue SoftPromoteHalfRes_XINT_TO_FP(SDNode *N);
SDValue SoftPromoteHalfRes_UNDEF(SDNode *N);		SDValue SoftPromoteHalfRes_UNDEF(SDNode *N);
SDValue SoftPromoteHalfRes_VECREDUCE(SDNode *N);		SDValue SoftPromoteHalfRes_VECREDUCE(SDNode *N);
		SDValue SoftPromoteHalfRes_VECREDUCE_SEQ(SDNode *N);
		nikicUnsubmitted Done Reply Inline Actions Looks like there is no definition for this method. There's probably no way to test it (requires X86), but I'd suggest to still include it, as the implementation is the same as the other soft float legalization. I'd also be fine with not having it, in which case this declaration should be dropped as well. nikic: Looks like there is no definition for this method. There's probably no way to test it (requires…
		cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions Implemented. cameron.mcinally: Implemented.

bool SoftPromoteHalfOperand(SDNode *N, unsigned OpNo);		bool SoftPromoteHalfOperand(SDNode *N, unsigned OpNo);
SDValue SoftPromoteHalfOp_BITCAST(SDNode *N);		SDValue SoftPromoteHalfOp_BITCAST(SDNode *N);
SDValue SoftPromoteHalfOp_FCOPYSIGN(SDNode *N, unsigned OpNo);		SDValue SoftPromoteHalfOp_FCOPYSIGN(SDNode *N, unsigned OpNo);
SDValue SoftPromoteHalfOp_FP_EXTEND(SDNode *N);		SDValue SoftPromoteHalfOp_FP_EXTEND(SDNode *N);
SDValue SoftPromoteHalfOp_FP_TO_XINT(SDNode *N);		SDValue SoftPromoteHalfOp_FP_TO_XINT(SDNode *N);
SDValue SoftPromoteHalfOp_SETCC(SDNode *N);		SDValue SoftPromoteHalfOp_SETCC(SDNode *N);
SDValue SoftPromoteHalfOp_SELECT_CC(SDNode *N, unsigned OpNo);		SDValue SoftPromoteHalfOp_SELECT_CC(SDNode *N, unsigned OpNo);
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:
SDValue ScalarizeVecOp_CONCAT_VECTORS(SDNode *N);		SDValue ScalarizeVecOp_CONCAT_VECTORS(SDNode *N);
SDValue ScalarizeVecOp_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue ScalarizeVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue ScalarizeVecOp_VSELECT(SDNode *N);		SDValue ScalarizeVecOp_VSELECT(SDNode *N);
SDValue ScalarizeVecOp_VSETCC(SDNode *N);		SDValue ScalarizeVecOp_VSETCC(SDNode *N);
SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo);		SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo);		SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_VECREDUCE(SDNode *N);		SDValue ScalarizeVecOp_VECREDUCE(SDNode *N);
		SDValue ScalarizeVecOp_VECREDUCE_SEQ(SDNode *N);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Vector Splitting Support: LegalizeVectorTypes.cpp		// Vector Splitting Support: LegalizeVectorTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Given a processed vector Op which was split into vectors of half the size,		/// Given a processed vector Op which was split into vectors of half the size,
/// this method returns the halves. The first elements of Op coincide with the		/// this method returns the halves. The first elements of Op coincide with the
/// elements of Lo; the remaining elements of Op coincide with the elements of		/// elements of Lo; the remaining elements of Op coincide with the elements of
Show All 39 Lines	private:
void SplitVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N, SDValue &Lo,		void SplitVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N, SDValue &Lo,
SDValue &Hi);		SDValue &Hi);
void SplitVecRes_VAARG(SDNode *N, SDValue &Lo, SDValue &Hi);		void SplitVecRes_VAARG(SDNode *N, SDValue &Lo, SDValue &Hi);

// Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.		// Vector Operand Splitting: <128 x ty> -> 2 x <64 x ty>.
bool SplitVectorOperand(SDNode *N, unsigned OpNo);		bool SplitVectorOperand(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);		SDValue SplitVecOp_VSELECT(SDNode *N, unsigned OpNo);
SDValue SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo);		SDValue SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo);
		SDValue SplitVecOp_VECREDUCE_SEQ(SDNode *N);
SDValue SplitVecOp_UnaryOp(SDNode *N);		SDValue SplitVecOp_UnaryOp(SDNode *N);
SDValue SplitVecOp_TruncateHelper(SDNode *N);		SDValue SplitVecOp_TruncateHelper(SDNode *N);

SDValue SplitVecOp_BITCAST(SDNode *N);		SDValue SplitVecOp_BITCAST(SDNode *N);
SDValue SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N);		SDValue SplitVecOp_EXTRACT_SUBVECTOR(SDNode *N);
SDValue SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue SplitVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue SplitVecOp_ExtVecInRegOp(SDNode *N);		SDValue SplitVecOp_ExtVecInRegOp(SDNode *N);
SDValue SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue SplitVecOp_STORE(StoreSDNode *N, unsigned OpNo);
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	private:
SDValue WidenVecOp_MSCATTER(SDNode* N, unsigned OpNo);		SDValue WidenVecOp_MSCATTER(SDNode* N, unsigned OpNo);
SDValue WidenVecOp_SETCC(SDNode* N);		SDValue WidenVecOp_SETCC(SDNode* N);
SDValue WidenVecOp_STRICT_FSETCC(SDNode* N);		SDValue WidenVecOp_STRICT_FSETCC(SDNode* N);
SDValue WidenVecOp_VSELECT(SDNode *N);		SDValue WidenVecOp_VSELECT(SDNode *N);

SDValue WidenVecOp_Convert(SDNode *N);		SDValue WidenVecOp_Convert(SDNode *N);
SDValue WidenVecOp_FCOPYSIGN(SDNode *N);		SDValue WidenVecOp_FCOPYSIGN(SDNode *N);
SDValue WidenVecOp_VECREDUCE(SDNode *N);		SDValue WidenVecOp_VECREDUCE(SDNode *N);
		SDValue WidenVecOp_VECREDUCE_SEQ(SDNode *N);

/// Helper function to generate a set of operations to perform		/// Helper function to generate a set of operations to perform
/// a vector operation for a wider type.		/// a vector operation for a wider type.
///		///
SDValue UnrollVectorOp_StrictFP(SDNode *N, unsigned ResNE);		SDValue UnrollVectorOp_StrictFP(SDNode *N, unsigned ResNE);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Vector Widening Utilities Support: LegalizeVectorTypes.cpp		// Vector Widening Utilities Support: LegalizeVectorTypes.cpp
▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
case ISD::SDIVFIXSAT:		case ISD::SDIVFIXSAT:
case ISD::UDIVFIX:		case ISD::UDIVFIX:
case ISD::UDIVFIXSAT: {		case ISD::UDIVFIXSAT: {
unsigned Scale = Node->getConstantOperandVal(2);		unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),		Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);		Node->getValueType(0), Scale);
break;		break;
}		}
case ISD::VECREDUCE_SEQ_FADD:
Action = TLI.getOperationAction(Node->getOpcode(),
Node->getOperand(1).getValueType());
break;
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
Node->getOperand(0).getValueType());		Node->getOperand(0).getValueType());
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		Action = TLI.getOperationAction(Node->getOpcode(),
		Node->getOperand(1).getValueType());
		break;
}		}

LLVM_DEBUG(dbgs() << "\nLegalizing vector op: "; Node->dump(&DAG));		LLVM_DEBUG(dbgs() << "\nLegalizing vector op: "; Node->dump(&DAG));

SmallVector<SDValue, 8> ResultVals;		SmallVector<SDValue, 8> ResultVals;
switch (Action) {		switch (Action) {
default: llvm_unreachable("This action is not supported yet!");		default: llvm_unreachable("This action is not supported yet!");
case TargetLowering::Promote:		case TargetLowering::Promote:
▲ Show 20 Lines • Show All 365 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Results.push_back(TLI.expandVecReduce(Node, DAG));		Results.push_back(TLI.expandVecReduce(Node, DAG));
return;		return;
		case ISD::VECREDUCE_SEQ_FADD:
		Results.push_back(TLI.expandVecReduceSeq(Node, DAG));
		return;
case ISD::SREM:		case ISD::SREM:
case ISD::UREM:		case ISD::UREM:
ExpandREM(Node, Results);		ExpandREM(Node, Results);
return;		return;
}		}

Results.push_back(DAG.UnrollVectorOp(Node));		Results.push_back(DAG.UnrollVectorOp(Node));
}		}
▲ Show 20 Lines • Show All 584 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 617 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Res = ScalarizeVecOp_VECREDUCE(N);		Res = ScalarizeVecOp_VECREDUCE(N);
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		Res = ScalarizeVecOp_VECREDUCE_SEQ(N);
		break;
}		}
}		}

// If the result is null, the sub-method took care of registering results etc.		// If the result is null, the sub-method took care of registering results etc.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
SDValue DAGTypeLegalizer::ScalarizeVecOp_VECREDUCE(SDNode *N) {		SDValue DAGTypeLegalizer::ScalarizeVecOp_VECREDUCE(SDNode *N) {
SDValue Res = GetScalarizedVector(N->getOperand(0));		SDValue Res = GetScalarizedVector(N->getOperand(0));
// Result type may be wider than element type.		// Result type may be wider than element type.
if (Res.getValueType() != N->getValueType(0))		if (Res.getValueType() != N->getValueType(0))
Res = DAG.getNode(ISD::ANY_EXTEND, SDLoc(N), N->getValueType(0), Res);		Res = DAG.getNode(ISD::ANY_EXTEND, SDLoc(N), N->getValueType(0), Res);
return Res;		return Res;
}		}

		SDValue DAGTypeLegalizer::ScalarizeVecOp_VECREDUCE_SEQ(SDNode *N) {
		SDValue AccOp = N->getOperand(0);
		SDValue VecOp = N->getOperand(1);

		unsigned BaseOpc = ISD::getVecReduceBaseOpcode(N->getOpcode());

		SDValue Op = GetScalarizedVector(VecOp);
		return DAG.getNode(BaseOpc, SDLoc(N), N->getValueType(0),
		AccOp, Op, N->getFlags());
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Result Vector Splitting		// Result Vector Splitting
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// This method is called when the specified result of the specified node is		/// This method is called when the specified result of the specified node is
/// found to need vector splitting. At this point, the node may also have		/// found to need vector splitting. At this point, the node may also have
/// invalid operands or may have other results that need legalization, we just		/// invalid operands or may have other results that need legalization, we just
		nikicUnsubmitted Done Reply Inline Actions As this is a recurring pattern, I have extracted an `ISD::getVecReduceBaseOpcode()` function that does this. Please add VECREDUCE_SEQ_FADD there and then use it here and in expansion. nikic: As this is a recurring pattern, I have extracted an `ISD::getVecReduceBaseOpcode()` function…
/// know that (at least) one result needs vector splitting.		/// know that (at least) one result needs vector splitting.
void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {		void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
LLVM_DEBUG(dbgs() << "Split node result: "; N->dump(&DAG); dbgs() << "\n");		LLVM_DEBUG(dbgs() << "Split node result: "; N->dump(&DAG); dbgs() << "\n");
SDValue Lo, Hi;		SDValue Lo, Hi;

// See if the target wants to custom expand this node.		// See if the target wants to custom expand this node.
if (CustomLowerNode(N, N->getValueType(ResNo), true))		if (CustomLowerNode(N, N->getValueType(ResNo), true))
return;		return;
▲ Show 20 Lines • Show All 1,249 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Res = SplitVecOp_VECREDUCE(N, OpNo);		Res = SplitVecOp_VECREDUCE(N, OpNo);
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		Res = SplitVecOp_VECREDUCE_SEQ(N);
		break;
}		}
}		}

// If the result is null, the sub-method took care of registering results etc.		// If the result is null, the sub-method took care of registering results etc.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SplitVecOp_VECREDUCE(SDNode *N, unsigned OpNo) {

// Use the appropriate scalar instruction on the split subvectors before		// Use the appropriate scalar instruction on the split subvectors before
// reducing the now partially reduced smaller vector.		// reducing the now partially reduced smaller vector.
unsigned CombineOpc = ISD::getVecReduceBaseOpcode(N->getOpcode());		unsigned CombineOpc = ISD::getVecReduceBaseOpcode(N->getOpcode());
SDValue Partial = DAG.getNode(CombineOpc, dl, LoOpVT, Lo, Hi, N->getFlags());		SDValue Partial = DAG.getNode(CombineOpc, dl, LoOpVT, Lo, Hi, N->getFlags());
return DAG.getNode(N->getOpcode(), dl, ResVT, Partial, N->getFlags());		return DAG.getNode(N->getOpcode(), dl, ResVT, Partial, N->getFlags());
}		}

		SDValue DAGTypeLegalizer::SplitVecOp_VECREDUCE_SEQ(SDNode *N) {
		EVT ResVT = N->getValueType(0);
		SDValue Lo, Hi;
		SDLoc dl(N);

		SDValue AccOp = N->getOperand(0);
		SDValue VecOp = N->getOperand(1);
		SDNodeFlags Flags = N->getFlags();

		EVT VecVT = VecOp.getValueType();
		assert(VecVT.isVector() && "Can only split reduce vector operand");
		GetSplitVector(VecOp, Lo, Hi);
		EVT LoOpVT, HiOpVT;
		std::tie(LoOpVT, HiOpVT) = DAG.GetSplitDestVTs(VecVT);

		// Reduce low half.
		SDValue Partial = DAG.getNode(N->getOpcode(), dl, ResVT, AccOp, Lo, Flags);

		// Reduce high half, using low half result as initial value.
		return DAG.getNode(N->getOpcode(), dl, ResVT, Partial, Hi, Flags);
		}

SDValue DAGTypeLegalizer::SplitVecOp_UnaryOp(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_UnaryOp(SDNode *N) {
// The result has a legal vector type, but the input needs splitting.		// The result has a legal vector type, but the input needs splitting.
EVT ResVT = N->getValueType(0);		EVT ResVT = N->getValueType(0);
SDValue Lo, Hi;		SDValue Lo, Hi;
SDLoc dl(N);		SDLoc dl(N);
GetSplitVector(N->getOperand(N->isStrictFPOpcode() ? 1 : 0), Lo, Hi);		GetSplitVector(N->getOperand(N->isStrictFPOpcode() ? 1 : 0), Lo, Hi);
EVT InVT = Lo.getValueType();		EVT InVT = Lo.getValueType();

▲ Show 20 Lines • Show All 2,152 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Res = WidenVecOp_VECREDUCE(N);		Res = WidenVecOp_VECREDUCE(N);
break;		break;
		case ISD::VECREDUCE_SEQ_FADD:
		Res = WidenVecOp_VECREDUCE_SEQ(N);
		break;
}		}

// If Res is null, the sub-method took care of registering the result.		// If Res is null, the sub-method took care of registering the result.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
if (Res.getNode() == N)		if (Res.getNode() == N)
▲ Show 20 Lines • Show All 423 Lines • ▼ Show 20 Lines
SDValue DAGTypeLegalizer::WidenVecOp_VECREDUCE(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecOp_VECREDUCE(SDNode *N) {
SDLoc dl(N);		SDLoc dl(N);
SDValue Op = GetWidenedVector(N->getOperand(0));		SDValue Op = GetWidenedVector(N->getOperand(0));
EVT OrigVT = N->getOperand(0).getValueType();		EVT OrigVT = N->getOperand(0).getValueType();
EVT WideVT = Op.getValueType();		EVT WideVT = Op.getValueType();
EVT ElemVT = OrigVT.getVectorElementType();		EVT ElemVT = OrigVT.getVectorElementType();
SDNodeFlags Flags = N->getFlags();		SDNodeFlags Flags = N->getFlags();

SDValue NeutralElem = DAG.getNeutralElement(		unsigned Opc = N->getOpcode();
ISD::getVecReduceBaseOpcode(N->getOpcode()), dl, ElemVT, Flags);		unsigned BaseOpc = ISD::getVecReduceBaseOpcode(Opc);
		SDValue NeutralElem = DAG.getNeutralElement(BaseOpc, dl, ElemVT, Flags);
assert(NeutralElem && "Neutral element must exist");		assert(NeutralElem && "Neutral element must exist");

// Pad the vector with the neutral element.		// Pad the vector with the neutral element.
unsigned OrigElts = OrigVT.getVectorNumElements();		unsigned OrigElts = OrigVT.getVectorNumElements();
unsigned WideElts = WideVT.getVectorNumElements();		unsigned WideElts = WideVT.getVectorNumElements();
for (unsigned Idx = OrigElts; Idx < WideElts; Idx++)		for (unsigned Idx = OrigElts; Idx < WideElts; Idx++)
Op = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, WideVT, Op, NeutralElem,		Op = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, WideVT, Op, NeutralElem,
DAG.getVectorIdxConstant(Idx, dl));		DAG.getVectorIdxConstant(Idx, dl));

return DAG.getNode(N->getOpcode(), dl, N->getValueType(0), Op, Flags);		return DAG.getNode(Opc, dl, N->getValueType(0), Op, Flags);
		}

		SDValue DAGTypeLegalizer::WidenVecOp_VECREDUCE_SEQ(SDNode *N) {
		SDLoc dl(N);
		SDValue AccOp = N->getOperand(0);
		SDValue VecOp = N->getOperand(1);
		SDValue Op = GetWidenedVector(VecOp);

		EVT OrigVT = VecOp.getValueType();
		EVT WideVT = Op.getValueType();
		EVT ElemVT = OrigVT.getVectorElementType();
		SDNodeFlags Flags = N->getFlags();

		unsigned Opc = N->getOpcode();
		unsigned BaseOpc = ISD::getVecReduceBaseOpcode(Opc);
		SDValue NeutralElem = DAG.getNeutralElement(BaseOpc, dl, ElemVT, Flags);

		// Pad the vector with the neutral element.
		unsigned OrigElts = OrigVT.getVectorNumElements();
		unsigned WideElts = WideVT.getVectorNumElements();
		for (unsigned Idx = OrigElts; Idx < WideElts; Idx++)
		Op = DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, WideVT, Op, NeutralElem,
		DAG.getVectorIdxConstant(Idx, dl));

		return DAG.getNode(Opc, dl, N->getValueType(0), AccOp, Op, Flags);
}		}

SDValue DAGTypeLegalizer::WidenVecOp_VSELECT(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecOp_VSELECT(SDNode *N) {
// This only gets called in the case that the left and right inputs and		// This only gets called in the case that the left and right inputs and
// result are of a legal odd vector type, and the condition is illegal i1 of		// result are of a legal odd vector type, and the condition is illegal i1 of
// the same odd width that needs widening.		// the same odd width that needs widening.
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
assert(VT.isVector() && !VT.isPow2VectorType() && isTypeLegal(VT));		assert(VT.isVector() && !VT.isPow2VectorType() && isTypeLegal(VT));
Show All 9 Lines	return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, Select,
DAG.getVectorIdxConstant(0, DL));		DAG.getVectorIdxConstant(0, DL));
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Vector Widening Utilities		// Vector Widening Utilities
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Utility function to find the type to chop up a widen vector for load/store		// Utility function to find the type to chop up a widen vector for load/store
// TLI: Target lowering used to determine legal types.		// TLI: Target lowering used to determine legal types.
		nikicUnsubmitted Done Reply Inline Actions Again, this is a recurring pattern, so I have extracted a `SelectionDAG::getNeutralElement()` API for this. The invocation should be: SDValue NeutralElem = DAG.getNeutralElement( ISD::getVecReduceBaseOpcode(N->getOpcode()), dl, ElemVT, Flags); nikic: Again, this is a recurring pattern, so I have extracted a `SelectionDAG::getNeutralElement()`…
		cameron.mcinallyAuthorUnsubmitted Done Reply Inline Actions I improvised a little here for readability. Is that okay with you, @nikic? Although, I wonder if getting the BaseOpc is a necessary step here. Did you consider just adding VECREDUCE_* cases to `DAG.getNeutralElement(...)`? cameron.mcinally: I improvised a little here for readability. Is that okay with you, @nikic? Although, I wonder…
		nikicUnsubmitted Not Done Reply Inline Actions I improvised a little here for readability. Is that okay with you, @nikic? Sure! Although, I wonder if getting the BaseOpc is a necessary step here. Did you consider just adding VECREDUCE_* cases to `DAG.getNeutralElement(...)`? Not strictly necessary, but I personally found it a bit more elegant to not repeat the full VECREDUCE list another time here. nikic: > I improvised a little here for readability. Is that okay with you, @nikic? Sure! > Although…
// Width: Width left need to load/store.		// Width: Width left need to load/store.
// WidenVT: The widen vector type to load to/store from		// WidenVT: The widen vector type to load to/store from
// Align: If 0, don't allow use of a wider type		// Align: If 0, don't allow use of a wider type
// WidenEx: If Align is not 0, the amount additional we can load/store from.		// WidenEx: If Align is not 0, the amount additional we can load/store from.

static EVT FindMemType(SelectionDAG& DAG, const TargetLowering &TLI,		static EVT FindMemType(SelectionDAG& DAG, const TargetLowering &TLI,
unsigned Width, EVT WidenVT,		unsigned Width, EVT WidenVT,
unsigned Align = 0, unsigned WidenEx = 0) {		unsigned Align = 0, unsigned WidenEx = 0) {
▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	bool ISD::matchBinaryPredicate(
return true;		return true;
}		}

ISD::NodeType ISD::getVecReduceBaseOpcode(unsigned VecReduceOpcode) {		ISD::NodeType ISD::getVecReduceBaseOpcode(unsigned VecReduceOpcode) {
switch (VecReduceOpcode) {		switch (VecReduceOpcode) {
default:		default:
llvm_unreachable("Expected VECREDUCE opcode");		llvm_unreachable("Expected VECREDUCE opcode");
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
		case ISD::VECREDUCE_SEQ_FADD:
return ISD::FADD;		return ISD::FADD;
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
return ISD::FMUL;		return ISD::FMUL;
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
return ISD::ADD;		return ISD::ADD;
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
return ISD::MUL;		return ISD::MUL;
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
▲ Show 20 Lines • Show All 9,828 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,024 Lines • ▼ Show 20 Lines	for (unsigned i = 1; i < NumElts; i++)
Res = DAG.getNode(BaseOpcode, dl, EltVT, Res, Ops[i], Node->getFlags());		Res = DAG.getNode(BaseOpcode, dl, EltVT, Res, Ops[i], Node->getFlags());

// Result type may be wider than element type.		// Result type may be wider than element type.
if (EltVT != Node->getValueType(0))		if (EltVT != Node->getValueType(0))
Res = DAG.getNode(ISD::ANY_EXTEND, dl, Node->getValueType(0), Res);		Res = DAG.getNode(ISD::ANY_EXTEND, dl, Node->getValueType(0), Res);
return Res;		return Res;
}		}

		SDValue TargetLowering::expandVecReduceSeq(SDNode *Node, SelectionDAG &DAG) const {
		SDLoc dl(Node);
		SDValue AccOp = Node->getOperand(0);
		SDValue VecOp = Node->getOperand(1);
		SDNodeFlags Flags = Node->getFlags();

		EVT VT = VecOp.getValueType();
		EVT EltVT = VT.getVectorElementType();
		unsigned NumElts = VT.getVectorNumElements();

		SmallVector<SDValue, 8> Ops;
		DAG.ExtractVectorElements(VecOp, Ops, 0, NumElts);

		unsigned BaseOpcode = ISD::getVecReduceBaseOpcode(Node->getOpcode());

		SDValue Res = AccOp;
		for (unsigned i = 0; i < NumElts; i++)
		Res = DAG.getNode(BaseOpcode, dl, EltVT, Res, Ops[i], Flags);

		return Res;
		}

bool TargetLowering::expandREM(SDNode *Node, SDValue &Result,		bool TargetLowering::expandREM(SDNode *Node, SDValue &Result,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
SDLoc dl(Node);		SDLoc dl(Node);
bool isSigned = Node->getOpcode() == ISD::SREM;		bool isSigned = Node->getOpcode() == ISD::SREM;
unsigned DivOpc = isSigned ? ISD::SDIV : ISD::UDIV;		unsigned DivOpc = isSigned ? ISD::SDIV : ISD::UDIV;
unsigned DivRemOpc = isSigned ? ISD::SDIVREM : ISD::UDIVREM;		unsigned DivRemOpc = isSigned ? ISD::SDIVREM : ISD::UDIVREM;
SDValue Dividend = Node->getOperand(0);		SDValue Dividend = Node->getOperand(0);
Show All 14 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 727 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
setOperationAction(ISD::VECREDUCE_OR, VT, Expand);		setOperationAction(ISD::VECREDUCE_OR, VT, Expand);
setOperationAction(ISD::VECREDUCE_XOR, VT, Expand);		setOperationAction(ISD::VECREDUCE_XOR, VT, Expand);
setOperationAction(ISD::VECREDUCE_SMAX, VT, Expand);		setOperationAction(ISD::VECREDUCE_SMAX, VT, Expand);
setOperationAction(ISD::VECREDUCE_SMIN, VT, Expand);		setOperationAction(ISD::VECREDUCE_SMIN, VT, Expand);
setOperationAction(ISD::VECREDUCE_UMAX, VT, Expand);		setOperationAction(ISD::VECREDUCE_UMAX, VT, Expand);
setOperationAction(ISD::VECREDUCE_UMIN, VT, Expand);		setOperationAction(ISD::VECREDUCE_UMIN, VT, Expand);
setOperationAction(ISD::VECREDUCE_FMAX, VT, Expand);		setOperationAction(ISD::VECREDUCE_FMAX, VT, Expand);
setOperationAction(ISD::VECREDUCE_FMIN, VT, Expand);		setOperationAction(ISD::VECREDUCE_FMIN, VT, Expand);
		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Expand);
}		}

// Most targets ignore the @llvm.prefetch intrinsic.		// Most targets ignore the @llvm.prefetch intrinsic.
setOperationAction(ISD::PREFETCH, MVT::Other, Expand);		setOperationAction(ISD::PREFETCH, MVT::Other, Expand);

// Most targets also ignore the @llvm.readcyclecounter intrinsic.		// Most targets also ignore the @llvm.readcyclecounter intrinsic.
setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);		setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Expand);

▲ Show 20 Lines • Show All 1,462 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 771 Lines • ▼ Show 20 Lines	public:
/// merge. However, merging them creates a BUILD_VECTOR that is just as		/// merge. However, merging them creates a BUILD_VECTOR that is just as
/// illegal as the original, thus leading to an infinite legalisation loop.		/// illegal as the original, thus leading to an infinite legalisation loop.
/// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal		/// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal
/// vector types this override can be removed.		/// vector types this override can be removed.
bool mergeStoresAfterLegalization(EVT VT) const override {		bool mergeStoresAfterLegalization(EVT VT) const override {
return !useSVEForFixedLengthVectors();		return !useSVEForFixedLengthVectors();
}		}

// FIXME: Move useSVEForFixedLengthVectors*() back to private scope once
// reduction legalization is complete.
bool useSVEForFixedLengthVectors() const;
// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.
bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;

private:		private:
/// Keep a pointer to the AArch64Subtarget around so that we can		/// Keep a pointer to the AArch64Subtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const AArch64Subtarget *Subtarget;		const AArch64Subtarget *Subtarget;

bool isExtFreeImpl(const Instruction *Ext) const override;		bool isExtFreeImpl(const Instruction *Ext) const override;

void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);		void addTypeForNEON(MVT VT, MVT PromotedBitwiseVT);
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	void ReplaceExtractSubVectorResults(SDNode *N,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;		bool shouldNormalizeToSelectSequence(LLVMContext &, EVT) const override;

void finalizeLowering(MachineFunction &MF) const override;		void finalizeLowering(MachineFunction &MF) const override;

bool shouldLocalize(const MachineInstr &MI,		bool shouldLocalize(const MachineInstr &MI,
const TargetTransformInfo *TTI) const override;		const TargetTransformInfo *TTI) const override;

		bool useSVEForFixedLengthVectors() const;
		// Normally SVE is only used for byte size vectors that do not fit within a
		// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
		// used for 64bit and 128bit vectors as well.
		bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;
};		};

namespace AArch64 {		namespace AArch64 {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace AArch64		} // end namespace AArch64

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	int getInterleavedMemoryOpCost(
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);

bool		bool
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader);		bool &AllowPromotionWithoutCommonHeader);

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::vector_reduce_fadd: {
Value *VecOp = II->getArgOperand(1);
EVT VT = TLI->getValueType(getDataLayout(), VecOp->getType());
if (ST->hasSVE() &&
TLI->useSVEForFixedLengthVectorVT(VT, /OverrideNEON=/true))
return false;

return !II->getFastMathFlags().allowReassoc();
}
case Intrinsic::vector_reduce_fmul:		case Intrinsic::vector_reduce_fmul:
// We don't have legalization support for ordered FP reductions.		// We don't have legalization support for ordered FMUL reductions.
return !II->getFastMathFlags().allowReassoc();		return !II->getFastMathFlags().allowReassoc();

default:		default:
// Don't expand anything else, let legalization deal with it.		// Don't expand anything else, let legalization deal with it.
return false;		return false;
}		}
}		}

Show All 19 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	public:
bool preferInLoopReduction(unsigned Opcode, Type *Ty,		bool preferInLoopReduction(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const;		TTI::ReductionFlags Flags) const;

bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,		bool preferPredicatedReductionSelect(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const;		TTI::ReductionFlags Flags) const;

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::vector_reduce_fadd:
case Intrinsic::vector_reduce_fmul:		case Intrinsic::vector_reduce_fmul:
// We don't have legalization support for ordered FP reductions.		// We don't have legalization support for ordered FMUL reductions.
return !II->getFastMathFlags().allowReassoc();		return !II->getFastMathFlags().allowReassoc();
default:		default:
// Don't expand anything else, let legalization deal with it.		// Don't expand anything else, let legalization deal with it.
return false;		return false;
}		}
}		}

int getCFInstrCost(unsigned Opcode,		int getCFInstrCost(unsigned Opcode,
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	define half @fadda_v32f16(half %start, <32 x half>* %a) #0 {			define half @fadda_v32f16(half %start, <32 x half>* %a) #0 {
	; CHECK-LABEL: fadda_v32f16:			; CHECK-LABEL: fadda_v32f16:
	; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl32			; VBITS_GE_512: ptrue [[PG:p[0-9]+]].h, vl32
	; VBITS_GE_512-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]			; VBITS_GE_512-NEXT: ld1h { [[OP:z[0-9]+]].h }, [[PG]]/z, [x0]
	; VBITS_GE_512-NEXT: fadda h0, [[PG]], h0, [[OP]].h			; VBITS_GE_512-NEXT: fadda h0, [[PG]], h0, [[OP]].h
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret

	; Ensure sensible type legalisation.			; Ensure sensible type legalisation.
	; VBITS_EQ_256-COUNT-32: fadd			; VBITS_EQ_256: add x8, x0, #32
	; VBITS_EQ_256: ret			; VBITS_EQ_256-NEXT: ptrue [[PG:p[0-9]+]].h, vl16
				; VBITS_EQ_256-DAG: ld1h { [[LO:z[0-9]+]].h }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ld1h { [[HI:z[0-9]+]].h }, [[PG]]/z, [x8]
				; VBITS_EQ_256-NEXT: fadda h0, [[PG]], h0, [[LO]].h
				; VBITS_EQ_256-NEXT: fadda h0, [[PG]], h0, [[HI]].h
				; VBITS_EQ_256-NEXT: ret
	%op = load <32 x half>, <32 x half>* %a			%op = load <32 x half>, <32 x half>* %a
	%res = call half @llvm.vector.reduce.fadd.v32f16(half %start, <32 x half> %op)			%res = call half @llvm.vector.reduce.fadd.v32f16(half %start, <32 x half> %op)
	ret half %res			ret half %res
	}			}

	define half @fadda_v64f16(half %start, <64 x half>* %a) #0 {			define half @fadda_v64f16(half %start, <64 x half>* %a) #0 {
	; CHECK-LABEL: fadda_v64f16:			; CHECK-LABEL: fadda_v64f16:
	; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].h, vl64			; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].h, vl64
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	define float @fadda_v16f32(float %start, <16 x float>* %a) #0 {			define float @fadda_v16f32(float %start, <16 x float>* %a) #0 {
	; CHECK-LABEL: fadda_v16f32:			; CHECK-LABEL: fadda_v16f32:
	; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16			; VBITS_GE_512: ptrue [[PG:p[0-9]+]].s, vl16
	; VBITS_GE_512-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]			; VBITS_GE_512-NEXT: ld1w { [[OP:z[0-9]+]].s }, [[PG]]/z, [x0]
	; VBITS_GE_512-NEXT: fadda s0, [[PG]], s0, [[OP]].s			; VBITS_GE_512-NEXT: fadda s0, [[PG]], s0, [[OP]].s
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret

	; Ensure sensible type legalisation.			; Ensure sensible type legalisation.
	; VBITS_EQ_256-COUNT-16: fadd			; VBITS_EQ_256: add x8, x0, #32
	; VBITS_EQ_256: ret			; VBITS_EQ_256-NEXT: ptrue [[PG:p[0-9]+]].s, vl8
				; VBITS_EQ_256-DAG: ld1w { [[LO:z[0-9]+]].s }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ld1w { [[HI:z[0-9]+]].s }, [[PG]]/z, [x8]
				; VBITS_EQ_256-NEXT: fadda s0, [[PG]], s0, [[LO]].s
				; VBITS_EQ_256-NEXT: fadda s0, [[PG]], s0, [[HI]].s
				; VBITS_EQ_256-NEXT: ret
	%op = load <16 x float>, <16 x float>* %a			%op = load <16 x float>, <16 x float>* %a
	%res = call float @llvm.vector.reduce.fadd.v16f32(float %start, <16 x float> %op)			%res = call float @llvm.vector.reduce.fadd.v16f32(float %start, <16 x float> %op)
	ret float %res			ret float %res
	}			}

	define float @fadda_v32f32(float %start, <32 x float>* %a) #0 {			define float @fadda_v32f32(float %start, <32 x float>* %a) #0 {
	; CHECK-LABEL: fadda_v32f32:			; CHECK-LABEL: fadda_v32f32:
	; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32			; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].s, vl32
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	define double @fadda_v8f64(double %start, <8 x double>* %a) #0 {			define double @fadda_v8f64(double %start, <8 x double>* %a) #0 {
	; CHECK-LABEL: fadda_v8f64:			; CHECK-LABEL: fadda_v8f64:
	; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8			; VBITS_GE_512: ptrue [[PG:p[0-9]+]].d, vl8
	; VBITS_GE_512-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]			; VBITS_GE_512-NEXT: ld1d { [[OP:z[0-9]+]].d }, [[PG]]/z, [x0]
	; VBITS_GE_512-NEXT: fadda d0, [[PG]], d0, [[OP]].d			; VBITS_GE_512-NEXT: fadda d0, [[PG]], d0, [[OP]].d
	; VBITS_GE_512-NEXT: ret			; VBITS_GE_512-NEXT: ret

	; Ensure sensible type legalisation.			; Ensure sensible type legalisation.
	; VBITS_EQ_256-COUNT-8: fadd			; VBITS_EQ_256: add x8, x0, #32
	; VBITS_EQ_256: ret			; VBITS_EQ_256-NEXT: ptrue [[PG:p[0-9]+]].d, vl4
				; VBITS_EQ_256-DAG: ld1d { [[LO:z[0-9]+]].d }, [[PG]]/z, [x0]
				; VBITS_EQ_256-DAG: ld1d { [[HI:z[0-9]+]].d }, [[PG]]/z, [x8]
				; VBITS_EQ_256-NEXT: fadda d0, [[PG]], d0, [[LO]].d
				; VBITS_EQ_256-NEXT: fadda d0, [[PG]], d0, [[HI]].d
				; VBITS_EQ_256-NEXT: ret
	%op = load <8 x double>, <8 x double>* %a			%op = load <8 x double>, <8 x double>* %a
	%res = call double @llvm.vector.reduce.fadd.v8f64(double %start, <8 x double> %op)			%res = call double @llvm.vector.reduce.fadd.v8f64(double %start, <8 x double> %op)
	ret double %res			ret double %res
	}			}

	define double @fadda_v16f64(double %start, <16 x double>* %a) #0 {			define double @fadda_v16f64(double %start, <16 x double>* %a) #0 {
	; CHECK-LABEL: fadda_v16f64:			; CHECK-LABEL: fadda_v16f64:
	; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16			; VBITS_GE_1024: ptrue [[PG:p[0-9]+]].d, vl16
	▲ Show 20 Lines • Show All 757 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization-strict.ll

	Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call float @llvm.vector.reduce.fadd.f32.v3f32(float %s, <3 x float> %a)			%b = call float @llvm.vector.reduce.fadd.f32.v3f32(float %s, <3 x float> %a)
	ret float %b			ret float %b
	}			}

	define float @test_v3f32_neutral(<3 x float> %a) nounwind {			define float @test_v3f32_neutral(<3 x float> %a) nounwind {
	; CHECK-LABEL: test_v3f32_neutral:			; CHECK-LABEL: test_v3f32_neutral:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: faddp s1, v0.2s			; CHECK-NEXT: mov s1, v0.s[2]
	; CHECK-NEXT: mov s0, v0.s[2]			; CHECK-NEXT: faddp s0, v0.2s
	; CHECK-NEXT: fadd s0, s1, s0			; CHECK-NEXT: fadd s0, s0, s1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call float @llvm.vector.reduce.fadd.f32.v3f32(float -0.0, <3 x float> %a)			%b = call float @llvm.vector.reduce.fadd.f32.v3f32(float -0.0, <3 x float> %a)
	ret float %b			ret float %b
	}			}

	define float @test_v5f32(<5 x float> %a, float %s) nounwind {			define float @test_v5f32(<5 x float> %a, float %s) nounwind {
	; CHECK-LABEL: test_v5f32:			; CHECK-LABEL: test_v5f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call fp128 @llvm.vector.reduce.fadd.f128.v2f128(fp128 0xL00000000000000008000000000000000, <2 x fp128> %a)			%b = call fp128 @llvm.vector.reduce.fadd.f128.v2f128(fp128 0xL00000000000000008000000000000000, <2 x fp128> %a)
	ret fp128 %b			ret fp128 %b
	}			}

	define float @test_v16f32(<16 x float> %a, float %s) nounwind {			define float @test_v16f32(<16 x float> %a, float %s) nounwind {
	; CHECK-LABEL: test_v16f32:			; CHECK-LABEL: test_v16f32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fadd s4, s4, s0			; CHECK-NEXT: mov s22, v0.s[3]
	; CHECK-NEXT: mov s5, v0.s[1]			; CHECK-NEXT: mov s23, v0.s[2]
	; CHECK-NEXT: fadd s4, s4, s5			; CHECK-NEXT: mov s24, v0.s[1]
	; CHECK-NEXT: mov s5, v0.s[2]
	; CHECK-NEXT: mov s0, v0.s[3]
	; CHECK-NEXT: fadd s4, s4, s5
	; CHECK-NEXT: fadd s0, s4, s0			; CHECK-NEXT: fadd s0, s4, s0
	; CHECK-NEXT: mov s5, v1.s[1]			; CHECK-NEXT: fadd s0, s0, s24
	; CHECK-NEXT: fadd s0, s0, s1			; CHECK-NEXT: fadd s0, s0, s23
	; CHECK-NEXT: mov s4, v1.s[2]			; CHECK-NEXT: fadd s0, s0, s22
	; CHECK-NEXT: fadd s0, s0, s5			; CHECK-NEXT: mov s21, v1.s[1]
	; CHECK-NEXT: mov s1, v1.s[3]			; CHECK-NEXT: fadd s0, s0, s1
	; CHECK-NEXT: fadd s0, s0, s4			; CHECK-NEXT: mov s20, v1.s[2]
	; CHECK-NEXT: fadd s0, s0, s1			; CHECK-NEXT: fadd s0, s0, s21
	; CHECK-NEXT: mov s5, v2.s[1]			; CHECK-NEXT: mov s19, v1.s[3]
	; CHECK-NEXT: fadd s0, s0, s2			; CHECK-NEXT: fadd s0, s0, s20
	; CHECK-NEXT: mov s4, v2.s[2]			; CHECK-NEXT: fadd s0, s0, s19
	; CHECK-NEXT: fadd s0, s0, s5			; CHECK-NEXT: mov s18, v2.s[1]
	; CHECK-NEXT: mov s1, v2.s[3]			; CHECK-NEXT: fadd s0, s0, s2
	; CHECK-NEXT: fadd s0, s0, s4			; CHECK-NEXT: mov s17, v2.s[2]
	; CHECK-NEXT: fadd s0, s0, s1			; CHECK-NEXT: fadd s0, s0, s18
	; CHECK-NEXT: mov s2, v3.s[1]			; CHECK-NEXT: mov s16, v2.s[3]
				; CHECK-NEXT: fadd s0, s0, s17
				; CHECK-NEXT: fadd s0, s0, s16
				; CHECK-NEXT: mov s7, v3.s[1]
	; CHECK-NEXT: fadd s0, s0, s3			; CHECK-NEXT: fadd s0, s0, s3
	; CHECK-NEXT: mov s5, v3.s[2]			; CHECK-NEXT: mov s6, v3.s[2]
	; CHECK-NEXT: fadd s0, s0, s2			; CHECK-NEXT: fadd s0, s0, s7
				; CHECK-NEXT: mov s5, v3.s[3]
				; CHECK-NEXT: fadd s0, s0, s6
	; CHECK-NEXT: fadd s0, s0, s5			; CHECK-NEXT: fadd s0, s0, s5
	; CHECK-NEXT: mov s1, v3.s[3]
	; CHECK-NEXT: fadd s0, s0, s1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call float @llvm.vector.reduce.fadd.f32.v16f32(float %s, <16 x float> %a)			%b = call float @llvm.vector.reduce.fadd.f32.v16f32(float %s, <16 x float> %a)
	ret float %b			ret float %b
	}			}

	define float @test_v16f32_neutral(<16 x float> %a) nounwind {			define float @test_v16f32_neutral(<16 x float> %a) nounwind {
	; CHECK-LABEL: test_v16f32_neutral:			; CHECK-LABEL: test_v16f32_neutral:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: faddp s4, v0.2s			; CHECK-NEXT: mov s21, v0.s[3]
	; CHECK-NEXT: mov s5, v0.s[2]			; CHECK-NEXT: mov s22, v0.s[2]
	; CHECK-NEXT: mov s0, v0.s[3]			; CHECK-NEXT: faddp s0, v0.2s
	; CHECK-NEXT: fadd s4, s4, s5			; CHECK-NEXT: fadd s0, s0, s22
	; CHECK-NEXT: fadd s0, s4, s0			; CHECK-NEXT: fadd s0, s0, s21
	; CHECK-NEXT: mov s5, v1.s[1]			; CHECK-NEXT: mov s20, v1.s[1]
	; CHECK-NEXT: fadd s0, s0, s1			; CHECK-NEXT: fadd s0, s0, s1
	; CHECK-NEXT: mov s4, v1.s[2]			; CHECK-NEXT: mov s19, v1.s[2]
	; CHECK-NEXT: fadd s0, s0, s5			; CHECK-NEXT: fadd s0, s0, s20
	; CHECK-NEXT: mov s1, v1.s[3]			; CHECK-NEXT: mov s18, v1.s[3]
	; CHECK-NEXT: fadd s0, s0, s4			; CHECK-NEXT: fadd s0, s0, s19
	; CHECK-NEXT: fadd s0, s0, s1			; CHECK-NEXT: fadd s0, s0, s18
	; CHECK-NEXT: mov s5, v2.s[1]			; CHECK-NEXT: mov s17, v2.s[1]
	; CHECK-NEXT: fadd s0, s0, s2			; CHECK-NEXT: fadd s0, s0, s2
	; CHECK-NEXT: mov s4, v2.s[2]			; CHECK-NEXT: mov s16, v2.s[2]
	; CHECK-NEXT: fadd s0, s0, s5			; CHECK-NEXT: fadd s0, s0, s17
	; CHECK-NEXT: mov s1, v2.s[3]			; CHECK-NEXT: mov s7, v2.s[3]
	; CHECK-NEXT: fadd s0, s0, s4			; CHECK-NEXT: fadd s0, s0, s16
	; CHECK-NEXT: fadd s0, s0, s1			; CHECK-NEXT: fadd s0, s0, s7
	; CHECK-NEXT: mov s2, v3.s[1]			; CHECK-NEXT: mov s6, v3.s[1]
	; CHECK-NEXT: fadd s0, s0, s3			; CHECK-NEXT: fadd s0, s0, s3
	; CHECK-NEXT: mov s5, v3.s[2]			; CHECK-NEXT: mov s5, v3.s[2]
	; CHECK-NEXT: fadd s0, s0, s2			; CHECK-NEXT: fadd s0, s0, s6
				; CHECK-NEXT: mov s4, v3.s[3]
	; CHECK-NEXT: fadd s0, s0, s5			; CHECK-NEXT: fadd s0, s0, s5
	; CHECK-NEXT: mov s1, v3.s[3]			; CHECK-NEXT: fadd s0, s0, s4
	; CHECK-NEXT: fadd s0, s0, s1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = call float @llvm.vector.reduce.fadd.f32.v16f32(float -0.0, <16 x float> %a)			%b = call float @llvm.vector.reduce.fadd.f32.v16f32(float -0.0, <16 x float> %a)
	ret float %b			ret float %b
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add legalizations for VECREDUCE_SEQ_FADD ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 302011

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/test/CodeGen/AArch64/sve-fixed-length-fp-reduce.ll

llvm/test/CodeGen/AArch64/vecreduce-fadd-legalization-strict.ll

[AArch64] Add legalizations for VECREDUCE_SEQ_FADD
ClosedPublic