Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
paulwalker-arm

Commits

rG75db7cf78ad5: [SVE][CodeGen] Legalisation of integer -> floating point conversions

Summary

Splitting the operand of a scalable [S|U]INT_TO_FP results in a
concat_vectors operation where the operands are unpacked FP
scalable vectors (e.g. nxv2f32).
This patch adds custom lowering of concat_vectors which
checks that the number of operands is 2, and isel patterns
to match concat_vectors of scalable FP types with uzp1.

Diff Detail

Event Timeline

kmclaughlin created this revision.Sep 21 2020, 9:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2020, 9:52 AM

Herald added subscribers: llvm-commits, psnobl, hiraditya, tschuett. · View Herald Transcript

kmclaughlin requested review of this revision.Sep 21 2020, 9:52 AM

kmclaughlin added a parent revision: D87913: [SVE][CodeGen] Lower legal integer -> floating point conversions.

Harbormaster completed remote builds in B72401: Diff 293206.Sep 21 2020, 10:42 AM

paulwalker-arm added inline comments.Sep 22 2020, 3:16 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9090–9092	I imagine that CONCAT_VECTORS operand counts have been normalise to two by this point but given its restriction that all operands must be the same type I think you can protect against this and your `VT == OpVT*2` requirement using `if (getNumOperands() != 2)`. Can the isFloatingPoint check be made part of the assert?
9095–9096	Not sure CONCAT_VECTORS have a left and right hand side.

Changes made to the LowerCONCAT_VECTORS function:

Replaced OpVT.getVectorElementCount()*2 condition with Op.getNumOperands() != 2
Moved the isFloatingPoint() check into the assert

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9095–9096	Renamed these to Op0 & Op1

LGTM

Alternatively, we could make CONCAT_VECTOR "legal", and lower it using an isel pattern. But I'm not sure that's actually an improvement.

This revision is now accepted and ready to land.Sep 22 2020, 10:59 AM

paulwalker-arm accepted this revision.Sep 23 2020, 2:22 AM

sdesmalen added inline comments.Sep 23 2020, 2:44 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9090–9092	Can the isFloatingPoint check be made part of the assert? Could this also work for predicates (e.g. nxv8i1 : nxv8i1)?

In D88033#2288287, @efriedma wrote:

LGTM

Alternatively, we could make CONCAT_VECTOR "legal", and lower it using an isel pattern. But I'm not sure that's actually an improvement.

@efriedma At this level do we need to care about CONCAT_VECTORS that do not have two operands? If so then we'll need custom lowering anyway to ensure those cases get expanded?

In D88033#2289829, @paulwalker-arm wrote:

In D88033#2288287, @efriedma wrote:

LGTM

Alternatively, we could make CONCAT_VECTOR "legal", and lower it using an isel pattern. But I'm not sure that's actually an improvement.

@efriedma At this level do we need to care about CONCAT_VECTORS that do not have two operands? If so then we'll need custom lowering anyway to ensure those cases get expanded?

I think we could end up with a CONCAT_VECTORS with more than two operands. At least, I can't think of anything that would prevent it (as long as the operands have a legal type). You could still pattern-match that to a tree of uzp1, but I'd be more worried about blocking useful optimizations at that point; probably custom-lowering that makes sense.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9090–9092	We actually have patterns for i1 already. But would probably work, sure.

Simplified the LowerCONCAT_VECTORS function so that it returns Op if the number of operands is 2.
Added ISel patterns for lowering floating-point concat_vectors, making this consistent with how we lower concats of predicate types.

kmclaughlin edited the summary of this revision. (Show Details)Sep 24 2020, 10:51 AM

LGTM

efriedma added inline comments.Sep 24 2020, 11:28 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
991	Thinking about it a little more, we probably only want to mark CONCAT_VECTORS "Custom" for vectors with four or more elements. It's never legal if the result is a two-element vector.

paulwalker-arm added inline comments.Sep 24 2020, 4:16 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9086–9088	This can be summed up as `VT.isScalableVector() && isTypeLegal(VT) && isTypeLegal(Op.getOperand(0).getValueType())` but given Eli's comment you can just add the `isTypeLegal(Op.getOperand(0).getValueType())` part to the if statement to cover both bases.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1736–1740	I doubt you need these anymore.

Restricted the concat_vectors marked as 'custom' to those with result types of 4+ elements only
Removed unused patterns for reinterpret_cast
Moved additional isTypeLegal check to the if statement in LowerCONCAT_VECTORS

Reverted previous change to restrict the concat_vectors marked as custom. The extra check added to LowerCONCAT_VECTORS (isTypeLegal(Op.getOperand(0)...) will cover cases where the result is a two element vector.

LGTM assuming the potential compiler warning is removed.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9084–9086	I'm pretty sure this will cause an "unused variable - VT" compile time warning when building without asserts and thus will need to be verbose (i.e. `assert(Op.getValueType().isScalable() && ...`)

kmclaughlin added a child revision: D88321: [SVE][CodeGen] Lower scalable fp_extend & fp_round operations.Sep 25 2020, 10:15 AM

Closed by commit rG75db7cf78ad5: [SVE][CodeGen] Legalisation of integer -> floating point conversions (authored by kmclaughlin). · Explain WhyOct 1 2020, 2:45 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin marked an inline comment as done.

kmclaughlin added a commit: rG75db7cf78ad5: [SVE][CodeGen] Legalisation of integer -> floating point conversions.

Diff 294271

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 963 Lines • ▼ Show 20 Lines	for (MVT VT : MVT::integer_scalable_vector_valuetypes()) {
setOperationAction(ISD::SMAX, VT, Custom);		setOperationAction(ISD::SMAX, VT, Custom);
setOperationAction(ISD::UMAX, VT, Custom);		setOperationAction(ISD::UMAX, VT, Custom);
setOperationAction(ISD::SHL, VT, Custom);		setOperationAction(ISD::SHL, VT, Custom);
setOperationAction(ISD::SRL, VT, Custom);		setOperationAction(ISD::SRL, VT, Custom);
setOperationAction(ISD::SRA, VT, Custom);		setOperationAction(ISD::SRA, VT, Custom);
if (VT.getScalarType() == MVT::i1) {		if (VT.getScalarType() == MVT::i1) {
setOperationAction(ISD::SETCC, VT, Custom);		setOperationAction(ISD::SETCC, VT, Custom);
setOperationAction(ISD::TRUNCATE, VT, Custom);		setOperationAction(ISD::TRUNCATE, VT, Custom);
setOperationAction(ISD::CONCAT_VECTORS, VT, Legal);
}		}
}		}
}		}

for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {		for (auto VT : {MVT::nxv8i8, MVT::nxv4i16, MVT::nxv2i32}) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
}		}

		for (auto VT : {MVT::nxv4i1, MVT::nxv8i1, MVT::nxv16i1,
		MVT::nxv4f16, MVT::nxv8f16, MVT::nxv4f32}) {
		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
		}

setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);

for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {		for (MVT VT : MVT::fp_scalable_vector_valuetypes()) {
if (isTypeLegal(VT)) {		if (isTypeLegal(VT)) {
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
		efriedmaUnsubmitted Not Done Reply Inline Actions Thinking about it a little more, we probably only want to mark CONCAT_VECTORS "Custom" for vectors with four or more elements. It's never legal if the result is a two-element vector. efriedma: Thinking about it a little more, we probably only want to mark CONCAT_VECTORS "Custom" for…
setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);
setOperationAction(ISD::SELECT, VT, Custom);		setOperationAction(ISD::SELECT, VT, Custom);
setOperationAction(ISD::FADD, VT, Custom);		setOperationAction(ISD::FADD, VT, Custom);
setOperationAction(ISD::FDIV, VT, Custom);		setOperationAction(ISD::FDIV, VT, Custom);
setOperationAction(ISD::FMA, VT, Custom);		setOperationAction(ISD::FMA, VT, Custom);
setOperationAction(ISD::FMUL, VT, Custom);		setOperationAction(ISD::FMUL, VT, Custom);
setOperationAction(ISD::FNEG, VT, Custom);		setOperationAction(ISD::FNEG, VT, Custom);
setOperationAction(ISD::FSUB, VT, Custom);		setOperationAction(ISD::FSUB, VT, Custom);
▲ Show 20 Lines • Show All 2,774 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
case ISD::FRAMEADDR:		case ISD::FRAMEADDR:
return LowerFRAMEADDR(Op, DAG);		return LowerFRAMEADDR(Op, DAG);
case ISD::SPONENTRY:		case ISD::SPONENTRY:
return LowerSPONENTRY(Op, DAG);		return LowerSPONENTRY(Op, DAG);
case ISD::RETURNADDR:		case ISD::RETURNADDR:
return LowerRETURNADDR(Op, DAG);		return LowerRETURNADDR(Op, DAG);
case ISD::ADDROFRETURNADDR:		case ISD::ADDROFRETURNADDR:
return LowerADDROFRETURNADDR(Op, DAG);		return LowerADDROFRETURNADDR(Op, DAG);
		case ISD::CONCAT_VECTORS:
		return LowerCONCAT_VECTORS(Op, DAG);
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return LowerINSERT_VECTOR_ELT(Op, DAG);		return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
return LowerEXTRACT_VECTOR_ELT(Op, DAG);		return LowerEXTRACT_VECTOR_ELT(Op, DAG);
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return LowerBUILD_VECTOR(Op, DAG);		return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
return LowerVECTOR_SHUFFLE(Op, DAG);		return LowerVECTOR_SHUFFLE(Op, DAG);
▲ Show 20 Lines • Show All 5,282 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerBUILD_VECTOR(SDValue Op,
}		}

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LowerBUILD_VECTOR: use default expansion, failed to find "		dbgs() << "LowerBUILD_VECTOR: use default expansion, failed to find "
"better alternative\n");		"better alternative\n");
return SDValue();		return SDValue();
}		}

		SDValue AArch64TargetLowering::LowerCONCAT_VECTORS(SDValue Op,
		SelectionDAG &DAG) const {
		EVT VT = Op.getValueType();
		assert(VT.isScalableVector() && isTypeLegal(VT) &&
		"Expected legal scalable vector type!");
		paulwalker-armUnsubmitted Done Reply Inline Actions I'm pretty sure this will cause an "unused variable - VT" compile time warning when building without asserts and thus will need to be verbose (i.e. `assert(Op.getValueType().isScalable() && ...`) paulwalker-arm: I'm pretty sure this will cause an "unused variable - VT" compile time warning when building…

		if (isTypeLegal(Op.getOperand(0).getValueType()) && Op.getNumOperands() == 2)
		paulwalker-armUnsubmitted Done Reply Inline Actions This can be summed up as `VT.isScalableVector() && isTypeLegal(VT) && isTypeLegal(Op.getOperand(0).getValueType())` but given Eli's comment you can just add the `isTypeLegal(Op.getOperand(0).getValueType())` part to the if statement to cover both bases. paulwalker-arm: This can be summed up as `VT.isScalableVector() && isTypeLegal(VT) && isTypeLegal(Op.getOperand…
		return Op;

		return SDValue();
		}
		paulwalker-armUnsubmitted Done Reply Inline Actions I imagine that CONCAT_VECTORS operand counts have been normalise to two by this point but given its restriction that all operands must be the same type I think you can protect against this and your `VT == OpVT2` requirement using `if (getNumOperands() != 2)`. Can the isFloatingPoint check be made part of the assert? paulwalker-arm:* I imagine that CONCAT_VECTORS operand counts have been normalise to two by this point but given…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can the isFloatingPoint check be made part of the assert? Could this also work for predicates (e.g. nxv8i1 : nxv8i1)? sdesmalen: > Can the isFloatingPoint check be made part of the assert? Could this also work for predicates…
		efriedmaUnsubmitted Not Done Reply Inline Actions We actually have patterns for i1 already. But would probably work, sure. efriedma: We actually have patterns for i1 already. But would probably work, sure.

SDValue AArch64TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,		SDValue AArch64TargetLowering::LowerINSERT_VECTOR_ELT(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
assert(Op.getOpcode() == ISD::INSERT_VECTOR_ELT && "Unknown opcode!");		assert(Op.getOpcode() == ISD::INSERT_VECTOR_ELT && "Unknown opcode!");
		paulwalker-armUnsubmitted Done Reply Inline Actions Not sure CONCAT_VECTORS have a left and right hand side. paulwalker-arm: Not sure CONCAT_VECTORS have a left and right hand side.
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Renamed these to Op0 & Op1 kmclaughlin: Renamed these to Op0 & Op1

// Check for non-constant or out of range lane.		// Check for non-constant or out of range lane.
EVT VT = Op.getOperand(0).getValueType();		EVT VT = Op.getOperand(0).getValueType();
ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(2));		ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Op.getOperand(2));
if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())		if (!CI \|\| CI->getZExtValue() >= VT.getVectorNumElements())
return SDValue();		return SDValue();


▲ Show 20 Lines • Show All 6,896 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,195 Lines • ▼ Show 20 Lines	multiclass sve_prefetch<SDPatternOperator prefetch, ValueType PredTy, Instruction RegImmInst, Instruction RegRegInst, int scale, ComplexPattern AddrCP> {
// Concatenate two predicates.		// Concatenate two predicates.
def : Pat<(nxv4i1 (concat_vectors nxv2i1:$p1, nxv2i1:$p2)),		def : Pat<(nxv4i1 (concat_vectors nxv2i1:$p1, nxv2i1:$p2)),
(UZP1_PPP_S $p1, $p2)>;		(UZP1_PPP_S $p1, $p2)>;
def : Pat<(nxv8i1 (concat_vectors nxv4i1:$p1, nxv4i1:$p2)),		def : Pat<(nxv8i1 (concat_vectors nxv4i1:$p1, nxv4i1:$p2)),
(UZP1_PPP_H $p1, $p2)>;		(UZP1_PPP_H $p1, $p2)>;
def : Pat<(nxv16i1 (concat_vectors nxv8i1:$p1, nxv8i1:$p2)),		def : Pat<(nxv16i1 (concat_vectors nxv8i1:$p1, nxv8i1:$p2)),
(UZP1_PPP_B $p1, $p2)>;		(UZP1_PPP_B $p1, $p2)>;

		// Concatenate two floating point vectors.
		def : Pat<(nxv4f16 (concat_vectors nxv2f16:$v1, nxv2f16:$v2)),
		(UZP1_ZZZ_S $v1, $v2)>;
		def : Pat<(nxv8f16 (concat_vectors nxv4f16:$v1, nxv4f16:$v2)),
		(UZP1_ZZZ_H $v1, $v2)>;
		def : Pat<(nxv4f32 (concat_vectors nxv2f32:$v1, nxv2f32:$v2)),
		(UZP1_ZZZ_S $v1, $v2)>;

defm CMPHS_PPzZZ : sve_int_cmp_0<0b000, "cmphs", SETUGE, SETULE>;		defm CMPHS_PPzZZ : sve_int_cmp_0<0b000, "cmphs", SETUGE, SETULE>;
defm CMPHI_PPzZZ : sve_int_cmp_0<0b001, "cmphi", SETUGT, SETULT>;		defm CMPHI_PPzZZ : sve_int_cmp_0<0b001, "cmphi", SETUGT, SETULT>;
defm CMPGE_PPzZZ : sve_int_cmp_0<0b100, "cmpge", SETGE, SETLE>;		defm CMPGE_PPzZZ : sve_int_cmp_0<0b100, "cmpge", SETGE, SETLE>;
defm CMPGT_PPzZZ : sve_int_cmp_0<0b101, "cmpgt", SETGT, SETLT>;		defm CMPGT_PPzZZ : sve_int_cmp_0<0b101, "cmpgt", SETGT, SETLT>;
defm CMPEQ_PPzZZ : sve_int_cmp_0<0b110, "cmpeq", SETEQ, SETEQ>;		defm CMPEQ_PPzZZ : sve_int_cmp_0<0b110, "cmpeq", SETEQ, SETEQ>;
defm CMPNE_PPzZZ : sve_int_cmp_0<0b111, "cmpne", SETNE, SETNE>;		defm CMPNE_PPzZZ : sve_int_cmp_0<0b111, "cmpne", SETNE, SETNE>;

defm CMPEQ_WIDE_PPzZZ : sve_int_cmp_0_wide<0b010, "cmpeq", int_aarch64_sve_cmpeq_wide>;		defm CMPEQ_WIDE_PPzZZ : sve_int_cmp_0_wide<0b010, "cmpeq", int_aarch64_sve_cmpeq_wide>;
▲ Show 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	multiclass sve_prefetch<SDPatternOperator prefetch, ValueType PredTy, Instruction RegImmInst, Instruction RegRegInst, int scale, ComplexPattern AddrCP> {
def : Pat<(nxv8i1 (reinterpret_cast (nxv2i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv8i1 (reinterpret_cast (nxv2i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;
def : Pat<(nxv4i1 (reinterpret_cast (nxv16i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv4i1 (reinterpret_cast (nxv16i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;
def : Pat<(nxv4i1 (reinterpret_cast (nxv8i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv4i1 (reinterpret_cast (nxv8i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;
def : Pat<(nxv4i1 (reinterpret_cast (nxv2i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv4i1 (reinterpret_cast (nxv2i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;
def : Pat<(nxv2i1 (reinterpret_cast (nxv16i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv2i1 (reinterpret_cast (nxv16i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;
def : Pat<(nxv2i1 (reinterpret_cast (nxv8i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv2i1 (reinterpret_cast (nxv8i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;
def : Pat<(nxv2i1 (reinterpret_cast (nxv4i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;		def : Pat<(nxv2i1 (reinterpret_cast (nxv4i1 PPR:$src))), (COPY_TO_REGCLASS PPR:$src, PPR)>;

def : Pat<(nxv16i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv16i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_B 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_B 31), PPR:$Ps1, PPR:$Ps2)>;
def : Pat<(nxv8i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv8i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_H 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_H 31), PPR:$Ps1, PPR:$Ps2)>;
def : Pat<(nxv4i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv4i1 (and PPR:$Ps1, PPR:$Ps2)),
		paulwalker-armUnsubmitted Done Reply Inline Actions I doubt you need these anymore. paulwalker-arm: I doubt you need these anymore.
(AND_PPzPP (PTRUE_S 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_S 31), PPR:$Ps1, PPR:$Ps2)>;
def : Pat<(nxv2i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv2i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_D 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_D 31), PPR:$Ps1, PPR:$Ps2)>;

// Add more complex addressing modes here as required		// Add more complex addressing modes here as required
multiclass pred_load<ValueType Ty, ValueType PredTy, SDPatternOperator Load,		multiclass pred_load<ValueType Ty, ValueType PredTy, SDPatternOperator Load,
Instruction RegRegInst, Instruction RegImmInst, ComplexPattern AddrCP> {		Instruction RegRegInst, Instruction RegImmInst, ComplexPattern AddrCP> {
// reg + reg		// reg + reg
▲ Show 20 Lines • Show All 969 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-split-fcvt.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ptrue p0.d			; CHECK-NEXT: ptrue p0.d
	; CHECK-NEXT: uunpkhi z2.d, z0.s			; CHECK-NEXT: uunpkhi z2.d, z0.s
	; CHECK-NEXT: fcvtzu z0.d, p0/m, z1.s			; CHECK-NEXT: fcvtzu z0.d, p0/m, z1.s
	; CHECK-NEXT: fcvtzu z1.d, p0/m, z2.s			; CHECK-NEXT: fcvtzu z1.d, p0/m, z2.s
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%res = fptoui <vscale x 4 x float> %a to <vscale x 4 x i64>			%res = fptoui <vscale x 4 x float> %a to <vscale x 4 x i64>
	ret <vscale x 4 x i64> %res			ret <vscale x 4 x i64> %res
	}			}

				; SINT_TO_FP

				; Split operand
				define <vscale x 4 x float> @scvtf_s_nxv4i64(<vscale x 4 x i64> %a) {
				; CHECK-LABEL: scvtf_s_nxv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: scvtf z1.s, p0/m, z1.d
				; CHECK-NEXT: scvtf z0.s, p0/m, z0.d
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%res = sitofp <vscale x 4 x i64> %a to <vscale x 4 x float>
				ret <vscale x 4 x float> %res
				}

				define <vscale x 8 x half> @scvtf_h_nxv8i64(<vscale x 8 x i64> %a) {
				; CHECK-LABEL: scvtf_h_nxv8i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: scvtf z3.h, p0/m, z3.d
				; CHECK-NEXT: scvtf z2.h, p0/m, z2.d
				; CHECK-NEXT: scvtf z1.h, p0/m, z1.d
				; CHECK-NEXT: scvtf z0.h, p0/m, z0.d
				; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: ret
				%res = sitofp <vscale x 8 x i64> %a to <vscale x 8 x half>
				ret <vscale x 8 x half> %res
				}

				; Split result
				define <vscale x 16 x float> @scvtf_s_nxv16i8(<vscale x 16 x i8> %a) {
				; CHECK-LABEL: scvtf_s_nxv16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sunpklo z1.h, z0.b
				; CHECK-NEXT: sunpkhi z0.h, z0.b
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: sunpklo z2.s, z1.h
				; CHECK-NEXT: sunpkhi z1.s, z1.h
				; CHECK-NEXT: sunpklo z3.s, z0.h
				; CHECK-NEXT: sunpkhi z4.s, z0.h
				; CHECK-NEXT: scvtf z0.s, p0/m, z2.s
				; CHECK-NEXT: scvtf z1.s, p0/m, z1.s
				; CHECK-NEXT: scvtf z2.s, p0/m, z3.s
				; CHECK-NEXT: scvtf z3.s, p0/m, z4.s
				; CHECK-NEXT: ret
				%res = sitofp <vscale x 16 x i8> %a to <vscale x 16 x float>
				ret <vscale x 16 x float> %res
				}

				define <vscale x 4 x double> @scvtf_d_nxv4i32(<vscale x 4 x i32> %a) {
				; CHECK-LABEL: scvtf_d_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sunpklo z1.d, z0.s
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: sunpkhi z2.d, z0.s
				; CHECK-NEXT: scvtf z0.d, p0/m, z1.d
				; CHECK-NEXT: scvtf z1.d, p0/m, z2.d
				; CHECK-NEXT: ret
				%res = sitofp <vscale x 4 x i32> %a to <vscale x 4 x double>
				ret <vscale x 4 x double> %res
				}

				define <vscale x 4 x double> @scvtf_d_nxv4i1(<vscale x 4 x i1> %a) {
				; CHECK-LABEL: scvtf_d_nxv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip1 p3.s, p0.s, p1.s
				; CHECK-NEXT: zip2 p0.s, p0.s, p1.s
				; CHECK-NEXT: ptrue p2.d
				; CHECK-NEXT: mov z0.d, p3/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: mov z1.d, p0/z, #-1 // =0xffffffffffffffff
				; CHECK-NEXT: scvtf z0.d, p2/m, z0.d
				; CHECK-NEXT: scvtf z1.d, p2/m, z1.d
				; CHECK-NEXT: ret
				%res = sitofp <vscale x 4 x i1> %a to <vscale x 4 x double>
				ret <vscale x 4 x double> %res
				}

				; UINT_TO_FP

				; Split operand
				define <vscale x 4 x float> @ucvtf_s_nxv4i64(<vscale x 4 x i64> %a) {
				; CHECK-LABEL: ucvtf_s_nxv4i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ucvtf z1.s, p0/m, z1.d
				; CHECK-NEXT: ucvtf z0.s, p0/m, z0.d
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%res = uitofp <vscale x 4 x i64> %a to <vscale x 4 x float>
				ret <vscale x 4 x float> %res
				}

				define <vscale x 8 x half> @ucvtf_h_nxv8i64(<vscale x 8 x i64> %a) {
				; CHECK-LABEL: ucvtf_h_nxv8i64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: ucvtf z3.h, p0/m, z3.d
				; CHECK-NEXT: ucvtf z2.h, p0/m, z2.d
				; CHECK-NEXT: ucvtf z1.h, p0/m, z1.d
				; CHECK-NEXT: ucvtf z0.h, p0/m, z0.d
				; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: ret
				%res = uitofp <vscale x 8 x i64> %a to <vscale x 8 x half>
				ret <vscale x 8 x half> %res
				}

				; Split result
				define <vscale x 4 x double> @ucvtf_d_nxv4i32(<vscale x 4 x i32> %a) {
				; CHECK-LABEL: ucvtf_d_nxv4i32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uunpklo z1.d, z0.s
				; CHECK-NEXT: ptrue p0.d
				; CHECK-NEXT: uunpkhi z2.d, z0.s
				; CHECK-NEXT: ucvtf z0.d, p0/m, z1.d
				; CHECK-NEXT: ucvtf z1.d, p0/m, z2.d
				; CHECK-NEXT: ret
				%res = uitofp <vscale x 4 x i32> %a to <vscale x 4 x double>
				ret <vscale x 4 x double> %res
				}

				define <vscale x 4 x double> @ucvtf_d_nxv4i1(<vscale x 4 x i1> %a) {
				; CHECK-LABEL: ucvtf_d_nxv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip1 p3.s, p0.s, p1.s
				; CHECK-NEXT: zip2 p0.s, p0.s, p1.s
				; CHECK-NEXT: ptrue p2.d
				; CHECK-NEXT: mov z0.d, p3/z, #1 // =0x1
				; CHECK-NEXT: mov z1.d, p0/z, #1 // =0x1
				; CHECK-NEXT: ucvtf z0.d, p2/m, z0.d
				; CHECK-NEXT: ucvtf z1.d, p2/m, z1.d
				; CHECK-NEXT: ret
				%res = uitofp <vscale x 4 x i1> %a to <vscale x 4 x double>
				ret <vscale x 4 x double> %res
				}

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Legalisation of integer -> floating point conversions
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 294271

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/sve-split-fcvt.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Legalisation of integer -> floating point conversionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 294271

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/test/CodeGen/AArch64/sve-split-fcvt.ll

[SVE][CodeGen] Legalisation of integer -> floating point conversions
ClosedPublic