This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/GISel/
-
Target/
-
AArch64/
-
GISel/
-
AArch64LegalizerInfo.h
1/3
AArch64LegalizerInfo.cpp
-
test/CodeGen/AArch64/GlobalISel/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
legalize-fcopysign.mir
-
legalizer-info-validation.mir

Differential D108725

[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN
ClosedPublic

Authored by paquette on Aug 25 2021, 1:28 PM.

Download Raw Diff

Details

Reviewers

aemerson
jroelofs

Commits

rGa7aaafde2ef5: [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN

Summary

This is intended to be equivalent to the s32 + s64 cases in AArch64TargetLowering::LowerFCOPYSIGN.

Widen everything and then use G_BIT + a mask to handle the actual copysign operation. Then, narrow back down to s32/s64.

I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.) If there's a better way to do this then I'm happy to change it.

We also have a couple codegen deficiencies with how we emit vector constants right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

paquette created this revision.Aug 25 2021, 1:28 PM

Herald added subscribers: hiraditya, kristof.beyls, rovka. · View Herald TranscriptAug 25 2021, 1:28 PM

paquette requested review of this revision.Aug 25 2021, 1:28 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 1:28 PM

paquette added a parent revision: D108714: [AArch64][GlobalISel] Add a target-specific G_BIT opcode..Aug 25 2021, 1:28 PM

Harbormaster completed remote builds in B121220: Diff 368713.Aug 25 2021, 2:06 PM

aemerson added inline comments.Aug 25 2021, 3:42 PM

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
1470	What if we use G_MERGE instead? Do we get an INSERT_SUBREG?

paquette added a comment.Aug 25 2021, 5:15 PM

This comment was removed by paquette.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

1470

Not quite.

We get the following for s32 with G_MERGE_VALUES:

legalize_s32:
	adrp	x8, .LCPI0_0
	mov	v0.s[1], v0.s[0]
	mov	v1.s[1], v0.s[0]
	ldr	q2, [x8, :lo12:.LCPI0_0]
	mov	v0.s[2], v0.s[0]
	mov	v0.s[3], v0.s[0]
	mov	v1.s[2], v0.s[0]
	mov	v1.s[3], v0.s[0]
	bit	v0.16b, v1.16b, v2.16b
	ret

Meanwhile, with SDAG we get

	movi	v2.4s, #128, lsl #24 ; We should emit the constant like this, but we don't have that optimization
	bit	v0.16b, v1.16b, v2.16b
	ret

We can probably change the selector code to recognize the pattern though. Using G_INSERT_VECTOR_ELT is only slightly better.

aemerson accepted this revision.Aug 25 2021, 9:17 PM

aemerson added inline comments.

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
1470	Ok, let's go with the insert for now.

This revision is now accepted and ready to land.Aug 25 2021, 9:17 PM

Closed by commit rGa7aaafde2ef5: [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN (authored by paquette). · Explain WhySep 28 2022, 4:03 PM

This revision was automatically updated to reflect the committed changes.

paquette added a commit: rGa7aaafde2ef5: [AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN.

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 4:03 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

GISel/

AArch64LegalizerInfo.h

1 line

AArch64LegalizerInfo.cpp

67 lines

test/

CodeGen/

AArch64/

GlobalISel/

legalize-fcopysign.mir

56 lines

legalizer-info-validation.mir

4 lines

Diff 463703

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	private:
bool legalizeRotate(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeRotate(MachineInstr &MI, MachineRegisterInfo &MRI,
LegalizerHelper &Helper) const;		LegalizerHelper &Helper) const;
bool legalizeCTPOP(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeCTPOP(MachineInstr &MI, MachineRegisterInfo &MRI,
LegalizerHelper &Helper) const;		LegalizerHelper &Helper) const;
bool legalizeAtomicCmpxchg128(MachineInstr &MI, MachineRegisterInfo &MRI,		bool legalizeAtomicCmpxchg128(MachineInstr &MI, MachineRegisterInfo &MRI,
LegalizerHelper &Helper) const;		LegalizerHelper &Helper) const;
bool legalizeCTTZ(MachineInstr &MI, LegalizerHelper &Helper) const;		bool legalizeCTTZ(MachineInstr &MI, LegalizerHelper &Helper) const;
bool legalizeMemOps(MachineInstr &MI, LegalizerHelper &Helper) const;		bool legalizeMemOps(MachineInstr &MI, LegalizerHelper &Helper) const;
		bool legalizeFCopySign(MachineInstr &MI, LegalizerHelper &Helper) const;
const AArch64Subtarget *ST;		const AArch64Subtarget *ST;
};		};
} // End llvm namespace.		} // End llvm namespace.
#endif		#endif

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

Show First 20 Lines • Show All 808 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder({G_FMAXIMUM, G_FMINIMUM})
.legalFor({MinFPScalar, s32, s64})		.legalFor({MinFPScalar, s32, s64})
.minScalar(0, MinFPScalar);		.minScalar(0, MinFPScalar);

// TODO: Libcall support for s128.		// TODO: Libcall support for s128.
// TODO: s16 should be legal with full FP16 support.		// TODO: s16 should be legal with full FP16 support.
getActionDefinitionsBuilder({G_LROUND, G_LLROUND})		getActionDefinitionsBuilder({G_LROUND, G_LLROUND})
.legalFor({{s64, s32}, {s64, s64}});		.legalFor({{s64, s32}, {s64, s64}});

		// TODO: Custom legalization for vector types.
		// TODO: Custom legalization for mismatched types.
		// TODO: s16 support.
		getActionDefinitionsBuilder(G_FCOPYSIGN).customFor({{s32, s32}, {s64, s64}});

getLegacyLegalizerInfo().computeTables();		getLegacyLegalizerInfo().computeTables();
verify(*ST.getInstrInfo());		verify(*ST.getInstrInfo());
}		}

bool AArch64LegalizerInfo::legalizeCustom(LegalizerHelper &Helper,		bool AArch64LegalizerInfo::legalizeCustom(LegalizerHelper &Helper,
MachineInstr &MI) const {		MachineInstr &MI) const {
MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;		MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;
MachineRegisterInfo &MRI = *MIRBuilder.getMRI();		MachineRegisterInfo &MRI = *MIRBuilder.getMRI();
Show All 26 Lines	case TargetOpcode::G_ATOMIC_CMPXCHG:
return legalizeAtomicCmpxchg128(MI, MRI, Helper);		return legalizeAtomicCmpxchg128(MI, MRI, Helper);
case TargetOpcode::G_CTTZ:		case TargetOpcode::G_CTTZ:
return legalizeCTTZ(MI, Helper);		return legalizeCTTZ(MI, Helper);
case TargetOpcode::G_BZERO:		case TargetOpcode::G_BZERO:
case TargetOpcode::G_MEMCPY:		case TargetOpcode::G_MEMCPY:
case TargetOpcode::G_MEMMOVE:		case TargetOpcode::G_MEMMOVE:
case TargetOpcode::G_MEMSET:		case TargetOpcode::G_MEMSET:
return legalizeMemOps(MI, Helper);		return legalizeMemOps(MI, Helper);
		case TargetOpcode::G_FCOPYSIGN:
		return legalizeFCopySign(MI, Helper);
}		}

llvm_unreachable("expected switch to return");		llvm_unreachable("expected switch to return");
}		}

bool AArch64LegalizerInfo::legalizeRotate(MachineInstr &MI,		bool AArch64LegalizerInfo::legalizeRotate(MachineInstr &MI,
MachineRegisterInfo &MRI,		MachineRegisterInfo &MRI,
LegalizerHelper &Helper) const {		LegalizerHelper &Helper) const {
▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	if (MI.getOpcode() == TargetOpcode::G_MEMSET) {
Register ZExtValueReg =		Register ZExtValueReg =
MIRBuilder.buildAnyExt(LLT::scalar(64), Value).getReg(0);		MIRBuilder.buildAnyExt(LLT::scalar(64), Value).getReg(0);
Value.setReg(ZExtValueReg);		Value.setReg(ZExtValueReg);
return true;		return true;
}		}

return false;		return false;
}		}

		bool AArch64LegalizerInfo::legalizeFCopySign(MachineInstr &MI,
		LegalizerHelper &Helper) const {
		MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;
		MachineRegisterInfo &MRI = *MIRBuilder.getMRI();
		Register Dst = MI.getOperand(0).getReg();
		LLT DstTy = MRI.getType(Dst);
		assert(DstTy.isScalar() && "Only expected scalars right now!");
		const unsigned DstSize = DstTy.getSizeInBits();
		assert((DstSize == 32 \|\| DstSize == 64) && "Unexpected dst type!");
		assert(MRI.getType(MI.getOperand(2).getReg()) == DstTy &&
		"Expected homogeneous types!");

		// We want to materialize a mask with the high bit set.
		uint64_t EltMask;
		LLT VecTy;

		// TODO: s16 support.
		switch (DstSize) {
		default:
		llvm_unreachable("Unexpected type for G_FCOPYSIGN!");
		case 64: {
		// AdvSIMD immediate moves cannot materialize out mask in a single
		aemersonUnsubmitted Not Done Reply Inline Actions What if we use G_MERGE instead? Do we get an INSERT_SUBREG? aemerson: What if we use G_MERGE instead? Do we get an INSERT_SUBREG?
		paquetteAuthorUnsubmitted Done Reply Inline Actions Not quite. We get the following for s32 with G_MERGE_VALUES: legalize_s32: adrp x8, .LCPI0_0 mov v0.s[1], v0.s[0] mov v1.s[1], v0.s[0] ldr q2, [x8, :lo12:.LCPI0_0] mov v0.s[2], v0.s[0] mov v0.s[3], v0.s[0] mov v1.s[2], v0.s[0] mov v1.s[3], v0.s[0] bit v0.16b, v1.16b, v2.16b ret Meanwhile, with SDAG we get movi v2.4s, #128, lsl #24 ; We should emit the constant like this, but we don't have that optimization bit v0.16b, v1.16b, v2.16b ret We can probably change the selector code to recognize the pattern though. Using G_INSERT_VECTOR_ELT is only slightly better. paquette: Not quite. We get the following for s32 with G_MERGE_VALUES: ``` legalize_s32: adrp x8, .
		aemersonUnsubmitted Not Done Reply Inline Actions Ok, let's go with the insert for now. aemerson: Ok, let's go with the insert for now.
		// instruction for 64-bit elements. Instead, materialize zero and then
		// negate it.
		EltMask = 0;
		VecTy = LLT::fixed_vector(2, DstTy);
		break;
		}
		case 32:
		EltMask = 0x80000000ULL;
		VecTy = LLT::fixed_vector(4, DstTy);
		break;
		}

		// Widen In1 and In2 to 128 bits. We want these to eventually become
		// INSERT_SUBREGs.
		auto Undef = MIRBuilder.buildUndef(VecTy);
		auto Zero = MIRBuilder.buildConstant(DstTy, 0);
		auto Ins1 = MIRBuilder.buildInsertVectorElement(
		VecTy, Undef, MI.getOperand(1).getReg(), Zero);
		auto Ins2 = MIRBuilder.buildInsertVectorElement(
		VecTy, Undef, MI.getOperand(2).getReg(), Zero);

		// Construct the mask.
		auto Mask = MIRBuilder.buildConstant(VecTy, EltMask);
		if (DstSize == 64)
		Mask = MIRBuilder.buildFNeg(VecTy, Mask);

		auto Sel = MIRBuilder.buildInstr(AArch64::G_BIT, {VecTy}, {Ins1, Ins2, Mask});

		// Build an unmerge whose 0th elt is the original G_FCOPYSIGN destination. We
		// want this to eventually become an EXTRACT_SUBREG.
		SmallVector<Register, 2> DstRegs(1, Dst);
		for (unsigned I = 1, E = VecTy.getNumElements(); I < E; ++I)
		DstRegs.push_back(MRI.createGenericVirtualRegister(DstTy));
		MIRBuilder.buildUnmerge(DstRegs, Sel);
		MI.eraseFromParent();
		return true;
		}

llvm/test/CodeGen/AArch64/GlobalISel/legalize-fcopysign.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=aarch64 -run-pass=legalizer -verify-machineinstrs %s -o - \| FileCheck %s

				...
				---
				name: legalize_s32
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $s0, $s1
				; CHECK-LABEL: name: legalize_s32
				; CHECK: liveins: $s0, $s1
				; CHECK: %val:_(s32) = COPY $s0
				; CHECK: %sign:_(s32) = COPY $s1
				; CHECK: [[DEF:%[0-9]+]]:_(<4 x s32>) = G_IMPLICIT_DEF
				; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; CHECK: [[IVEC:%[0-9]+]]:_(<4 x s32>) = G_INSERT_VECTOR_ELT [[DEF]], %val(s32), [[C]](s32)
				; CHECK: [[IVEC1:%[0-9]+]]:_(<4 x s32>) = G_INSERT_VECTOR_ELT [[DEF]], %sign(s32), [[C]](s32)
				; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 -2147483648
				; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<4 x s32>) = G_BUILD_VECTOR [[C1]](s32), [[C1]](s32), [[C1]](s32), [[C1]](s32)
				; CHECK: [[BIT:%[0-9]+]]:_(<4 x s32>) = G_BIT [[IVEC]], [[IVEC1]], [[BUILD_VECTOR]]
				; CHECK: %fcopysign:_(s32), %10:_(s32), %11:_(s32), %12:_(s32) = G_UNMERGE_VALUES [[BIT]](<4 x s32>)
				; CHECK: $s0 = COPY %fcopysign(s32)
				; CHECK: RET_ReallyLR implicit $s0
				%val:_(s32) = COPY $s0
				%sign:_(s32) = COPY $s1
				%fcopysign:_(s32) = G_FCOPYSIGN %val, %sign(s32)
				$s0 = COPY %fcopysign(s32)
				RET_ReallyLR implicit $s0

				...
				---
				name: legalize_s64
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $d0, $d1
				; CHECK-LABEL: name: legalize_s64
				; CHECK: liveins: $d0, $d1
				; CHECK: %val:_(s64) = COPY $d0
				; CHECK: %sign:_(s64) = COPY $d1
				; CHECK: [[DEF:%[0-9]+]]:_(<2 x s64>) = G_IMPLICIT_DEF
				; CHECK: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
				; CHECK: [[IVEC:%[0-9]+]]:_(<2 x s64>) = G_INSERT_VECTOR_ELT [[DEF]], %val(s64), [[C]](s64)
				; CHECK: [[IVEC1:%[0-9]+]]:_(<2 x s64>) = G_INSERT_VECTOR_ELT [[DEF]], %sign(s64), [[C]](s64)
				; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[C]](s64), [[C]](s64)
				; CHECK: [[FNEG:%[0-9]+]]:_(<2 x s64>) = G_FNEG [[BUILD_VECTOR]]
				; CHECK: [[BIT:%[0-9]+]]:_(<2 x s64>) = G_BIT [[IVEC]], [[IVEC1]], [[FNEG]]
				; CHECK: %fcopysign:_(s64), %10:_(s64) = G_UNMERGE_VALUES [[BIT]](<2 x s64>)
				; CHECK: $d0 = COPY %fcopysign(s64)
				; CHECK: RET_ReallyLR implicit $d0
				%val:_(s64) = COPY $d0
				%sign:_(s64) = COPY $d1
				%fcopysign:_(s64) = G_FCOPYSIGN %val, %sign(s64)
				$d0 = COPY %fcopysign(s64)
				RET_ReallyLR implicit $d0

llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

	Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines
	# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}			# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
	# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: G_FABS (opcode {{[0-9]+}}): 1 type index, 0 imm indices			# DEBUG-NEXT: G_FABS (opcode {{[0-9]+}}): 1 type index, 0 imm indices
	# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}			# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
	# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected			# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
	# DEBUG-NEXT: G_FCOPYSIGN (opcode {{[0-9]+}}): 2 type indices			# DEBUG-NEXT: G_FCOPYSIGN (opcode {{[0-9]+}}): 2 type indices
	# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined			# DEBUG-NEXT: .. the first uncovered type index: 2, OK
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined			# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
	# DEBUG-NEXT: G_IS_FPCLASS (opcode {{[0-9]+}}): 2 type indices, 0 imm indices			# DEBUG-NEXT: G_IS_FPCLASS (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
	# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined			# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined			# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
	# DEBUG-NEXT: G_FCANONICALIZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices			# DEBUG-NEXT: G_FCANONICALIZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices
	# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined			# DEBUG-NEXT: .. type index coverage check SKIPPED: no rules defined
	# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined			# DEBUG-NEXT: .. imm index coverage check SKIPPED: no rules defined
	# DEBUG-NEXT: G_FMINNUM (opcode {{[0-9]+}}): 1 type index			# DEBUG-NEXT: G_FMINNUM (opcode {{[0-9]+}}): 1 type index
	# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}			# DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
	▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGNClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 463703

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.h

llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp

llvm/test/CodeGen/AArch64/GlobalISel/legalize-fcopysign.mir

llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN
ClosedPublic