But I don't think this should be starting from a CopyToReg. CopyToReg usually means we are copying the result into the return value of a function (or into a call argument). Copying into s0 for example. CopyFromReg is used to get the value from input physical registers from arguments.

An example like this, with fadd, would preferably use the movi lowering too: https://godbolt.org/z/bc99E85qz

If the FPConstant is still in the DAG at the point that we run instruction selection (I think it should be but it may be dependant on what AArch64TargetLowering::isFPImmLegal returns), then it may be best to add a tablegen pattern for cases that we can materialize with the integer movi instruction. From here: https://github.com/llvm/llvm-project/blob/ecff9b65b54c7a4bd79ca2af157c81595678f0ee/llvm/lib/Target/AArch64/AArch64InstrInfo.td#L1545, with a different fpimm to limit it to cases where the MOVi can be generated, and an EXTRACT_SUBREG to get the s reg. If that doesn't work (the custom ImmLeaf's might be a bit much) then perhaps something in AArch64ISelDAGToDAG.cpp?

Allen updated this revision to Diff 411608.Feb 26 2022, 6:59 AM

Allen retitled this revision from [AArch64] Use simd mov to initialise big const float immediate to [AArch64] Use simd mov to materialize big fp constants.

In D120452#3343727, @dmgreen wrote:

Using the MOVI to materialize fp constants sounds like a good idea.

Thanks very much for detail, refactor with above new method.

Harbormaster completed remote builds in B151614: Diff 411608.Feb 26 2022, 7:07 AM

Allen updated this revision to Diff 411714.Feb 27 2022, 5:22 PM

Harbormaster completed remote builds in B151692: Diff 411714.Feb 27 2022, 5:58 PM

update test cases

fhahn added a subscriber: fhahn.Feb 28 2022, 1:19 AM

fhahn added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/remat-const-float-simd.ll
3	It looks like the test is completely unrelated to the loop-vectorize pass. It should probably be either moved to `CodeGen/AArch64/` or added to one of the existing tests there.

dmgreen added inline comments.Feb 28 2022, 1:30 AM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
1210 ↗	(On Diff #411714)	This likely doesn't need to be an Operand, it can just be a FPImmLeaf
llvm/lib/Target/AArch64/AArch64InstrInfo.td
1551 ↗	(On Diff #411714)	This doesn't need to define a new instruction, it can use the existing MOVI (which I believe is probably called MOVIv2i32?)
llvm/test/Transforms/LoopVectorize/AArch64/remat-const-float-simd.ll
2	This test file should be in CodeGen/AArch64

Harbormaster completed remote builds in B151720: Diff 411754.Feb 28 2022, 1:46 AM

Allen updated this revision to Diff 411776.Feb 28 2022, 3:24 AM

Harbormaster completed remote builds in B151730: Diff 411776.Feb 28 2022, 4:16 AM

a) move case remat-const-float-simd.ll to CodeGen/AArch64
b) use MOVIv2i32 , and delete unneeded instruction MOVI2s_ns
c) delete Operand<f32>, and just be a FPImmLeaf

Allen added a reviewer: fhahn.Feb 28 2022, 5:02 AM

Harbormaster completed remote builds in B151738: Diff 411786.Feb 28 2022, 5:21 AM

In D120452#3348692, @Allen wrote:

a) move case remat-const-float-simd.ll to CodeGen/AArch64
b) use MOVIv2i32 , and delete unneeded instruction MOVI2s_ns
c) delete Operand<f32>, and just be a FPImmLeaf

Address all comment

llvm/lib/Target/AArch64/AArch64InstrFormats.td
1210 ↗	(On Diff #411714)	Yes, verified ok!
llvm/lib/Target/AArch64/AArch64InstrInfo.td
1551 ↗	(On Diff #411714)	Thanks, delete unneeded instruction MOVI2s_ns
1551 ↗	(On Diff #411608)	It's strange that here error with the following info, as it is ok on my local linux ? AArch64InstrInfo.td:1551:1: error: Type set is empty for each HW mode in 'MOVIv2s_ns' def MOVIv2s_ns : BaseSIMDModifiedImmVectorShift<1, 1, 0b10, V128, "movi", ".2s",

ping ?

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 12:50 AM

Thanks for the updates. There are other forms of MOVI that might also be useful for some fp immediates. The one we have here (shift an i8 by 24) I imagine is the most useful, but the others could be good in places too. The MOVN's too. I'm not sure what the best way to generalise this would be. As far as I can tell it needs lots of checks to the various isAdvSIMDModImmTypeXYZ methods wherever it is? This seems like a good beginning at least.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
1210 ↗	(On Diff #411786)	It's likely worth calling this out as a "AdvSIMDModImmType4" constant somehow. Maybe call it fpimm32SIMDModImmType4? Same for the XForm.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
6146 ↗	(On Diff #411786)	It is quite uncommon to not have NEON, but can you add a predicate for it: let Predicates = [HasNEON] in { It might be worth adding a run line without neon (`-mattr=-neon`) for the new remat test case too, to show the difference.
llvm/test/CodeGen/AArch64/remat-const-float-simd.ll
7 ↗	(On Diff #411786)	0x7fffffff -> 2147483648, as the 0x7fffffff gets rounded.

dmgreen added reviewers: efriedma, sdesmalen, david-arm.Mar 2 2022, 3:07 AM

Allen updated this revision to Diff 412396.Mar 2 2022, 6:33 AM

update testcase 0x7fffffff with 2147483648 as gets rounded

Harbormaster completed remote builds in B152153: Diff 412402.Mar 2 2022, 6:47 AM

Thanks. LGTM

This revision is now accepted and ready to land.Mar 2 2022, 8:13 AM

This revision was landed with ongoing or failed builds.Mar 4 2022, 8:36 AM

Closed by commit rG7a605ab7bfbc: [AArch64] Use simd mov to materialize big fp constants (authored by Allen, committed by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG7a605ab7bfbc: [AArch64] Use simd mov to materialize big fp constants.

Diff 410998

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 819 Lines • ▼ Show 20 Lines	bool functionArgumentNeedsConsecutiveRegisters(
Type *Ty, CallingConv::ID CallConv, bool isVarArg,		Type *Ty, CallingConv::ID CallConv, bool isVarArg,
const DataLayout &DL) const override;		const DataLayout &DL) const override;

/// Used for exception handling on Win64.		/// Used for exception handling on Win64.
bool needsFixedCatchObjects() const override;		bool needsFixedCatchObjects() const override;

bool fallBackToDAGISel(const Instruction &Inst) const override;		bool fallBackToDAGISel(const Instruction &Inst) const override;

		SDValue LowerCopyToReg(SDValue Op, SelectionDAG &DAG) const;

/// SVE code generation for fixed length vectors does not custom lower		/// SVE code generation for fixed length vectors does not custom lower
/// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to		/// BUILD_VECTOR. This makes BUILD_VECTOR legalisation a source of stores to
/// merge. However, merging them creates a BUILD_VECTOR that is just as		/// merge. However, merging them creates a BUILD_VECTOR that is just as
/// illegal as the original, thus leading to an infinite legalisation loop.		/// illegal as the original, thus leading to an infinite legalisation loop.
/// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal		/// NOTE: Once BUILD_VECTOR is legal or can be custom lowered for all legal
/// vector types this override can be removed.		/// vector types this override can be removed.
bool mergeStoresAfterLegalization(EVT VT) const override;		bool mergeStoresAfterLegalization(EVT VT) const override;

▲ Show 20 Lines • Show All 315 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	AArch64TargetLowering::AArch64TargetLowering(const TargetMachine &TM,
// When comparing vectors the result sets the different elements in the		// When comparing vectors the result sets the different elements in the
// vector to all-one or all-zero.		// vector to all-one or all-zero.
setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);		setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);

// Set up the register classes.		// Set up the register classes.
addRegisterClass(MVT::i32, &AArch64::GPR32allRegClass);		addRegisterClass(MVT::i32, &AArch64::GPR32allRegClass);
addRegisterClass(MVT::i64, &AArch64::GPR64allRegClass);		addRegisterClass(MVT::i64, &AArch64::GPR64allRegClass);

		setOperationAction(ISD::CopyToReg, MVT::Other, Custom);

if (Subtarget->hasLS64()) {		if (Subtarget->hasLS64()) {
addRegisterClass(MVT::i64x8, &AArch64::GPR64x8ClassRegClass);		addRegisterClass(MVT::i64x8, &AArch64::GPR64x8ClassRegClass);
setOperationAction(ISD::LOAD, MVT::i64x8, Custom);		setOperationAction(ISD::LOAD, MVT::i64x8, Custom);
setOperationAction(ISD::STORE, MVT::i64x8, Custom);		setOperationAction(ISD::STORE, MVT::i64x8, Custom);
}		}

if (Subtarget->hasFPARMv8()) {		if (Subtarget->hasFPARMv8()) {
addRegisterClass(MVT::f16, &AArch64::FPR16RegClass);		addRegisterClass(MVT::f16, &AArch64::FPR16RegClass);
▲ Show 20 Lines • Show All 4,815 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
LLVM_DEBUG(dbgs() << "Custom lowering: ");		LLVM_DEBUG(dbgs() << "Custom lowering: ");
LLVM_DEBUG(Op.dump());		LLVM_DEBUG(Op.dump());

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
llvm_unreachable("unimplemented operand");		llvm_unreachable("unimplemented operand");
return SDValue();		return SDValue();
		case ISD::CopyToReg:
		return LowerCopyToReg(Op, DAG);
case ISD::BITCAST:		case ISD::BITCAST:
return LowerBITCAST(Op, DAG);		return LowerBITCAST(Op, DAG);
case ISD::GlobalAddress:		case ISD::GlobalAddress:
return LowerGlobalAddress(Op, DAG);		return LowerGlobalAddress(Op, DAG);
case ISD::GlobalTLSAddress:		case ISD::GlobalTLSAddress:
return LowerGlobalTLSAddress(Op, DAG);		return LowerGlobalTLSAddress(Op, DAG);
case ISD::SETCC:		case ISD::SETCC:
case ISD::STRICT_FSETCC:		case ISD::STRICT_FSETCC:
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	case ISD::CTLZ:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::CTLZ_MERGE_PASSTHRU);
case ISD::CTTZ:		case ISD::CTTZ:
return LowerCTTZ(Op, DAG);		return LowerCTTZ(Op, DAG);
case ISD::VECTOR_SPLICE:		case ISD::VECTOR_SPLICE:
return LowerVECTOR_SPLICE(Op, DAG);		return LowerVECTOR_SPLICE(Op, DAG);
}		}
}		}

		static SDValue
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static SDValue -tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG, - const APInt &Bits, const SDValue LHS = nullptr); +static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG, + const APInt &Bits, + const SDValue LHS = nullptr); Lint: Pre-merge checks: clang-format: please reformat the code ``` -static SDValue -tryAdvSIMDModImm32(unsigned NewOp…
		tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
		const APInt &Bits, const SDValue *LHS = nullptr);
		static bool resolveBuildVector(BuildVectorSDNode *BVN, APInt &CnstBits,
		APInt &UndefBits);

		SDValue AArch64TargetLowering::LowerCopyToReg(SDValue Op,
		SelectionDAG &DAG) const {
		SDValue Chain = Op->getOperand(0);
		SDValue LHS = Op->getOperand(1);
		SDValue RHS = Op->getOperand(2);
		ConstantFPSDNode *CFP = dyn_cast<ConstantFPSDNode>(RHS);
		EVT VT = LHS->getValueType(0);
		if (!CFP \|\| !LHS->hasNUsesOfValue(2,0) \|\|
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - if (!CFP \|\| !LHS->hasNUsesOfValue(2,0) \|\| + if (!CFP \|\| !LHS->hasNUsesOfValue(2, 0) \|\| Lint: Pre-merge checks: clang-format: please reformat the code ``` - if (!CFP \|\| !LHS->hasNUsesOfValue(2,0) \|\| + if (!
		VT.getSimpleVT().SimpleTy != MVT::f32)
		return SDValue();

		const APFloat &FPVal = CFP->getValueAPF();
		const APInt ImmInt = FPVal.bitcastToAPInt();
		uint64_t Imm = ImmInt.getZExtValue();
		// Skip getFP32Imm as related value already deal with fmov.
		if (AArch64_AM::getFP32Imm(ImmInt) != -1 \|\| FPVal.isPosZero() \|\|
		!AArch64_AM::isAdvSIMDModImmType4(Imm << 32 \| Imm))
		return SDValue();

		SDLoc dl(Op);
		MVT VecTy = MVT::v2f32;
		APInt DefBits(VecTy.getSizeInBits(), 0);
		APInt UndefBits(VecTy.getSizeInBits(), 0);
		SDValue Parts[2];
		for (int Elt=0; Elt< 2; Elt++)
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (int Elt=0; Elt< 2; Elt++) + for (int Elt = 0; Elt < 2; Elt++) Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (int Elt=0; Elt< 2; Elt++) + for (int Elt =…
		Parts[Elt] = RHS;
		SDValue DupVal = DAG.getNode(ISD::BUILD_VECTOR, dl, MVT::v2f32, Parts);

		BuildVectorSDNode *BVN = cast<BuildVectorSDNode>(DupVal.getNode());
		if (!resolveBuildVector(BVN, DefBits, UndefBits))
		return SDValue();

		SDValue NewOp = tryAdvSIMDModImm32(AArch64ISD::MOVIshift,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - SDValue NewOp = tryAdvSIMDModImm32(AArch64ISD::MOVIshift, - DupVal, DAG, DefBits); + SDValue NewOp = + tryAdvSIMDModImm32(AArch64ISD::MOVIshift, DupVal, DAG, DefBits); Lint: Pre-merge checks: clang-format: please reformat the code ``` - SDValue NewOp = tryAdvSIMDModImm32(AArch64ISD…
		DupVal, DAG, DefBits);
		SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, NewOp,
		DAG.getConstant(0, dl, MVT::i32));

		DAG.ReplaceAllUsesWith(RHS, Elt);
		return Op;
		}

bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {		bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {
return !Subtarget->useSVEForFixedLengthVectors();		return !Subtarget->useSVEForFixedLengthVectors();
}		}

bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(		bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(
EVT VT, bool OverrideNEON) const {		EVT VT, bool OverrideNEON) const {
if (!VT.isFixedLengthVector())		if (!VT.isFixedLengthVector())
return false;		return false;
▲ Show 20 Lines • Show All 4,873 Lines • ▼ Show 20 Lines	if (Bits.getHiBits(64) == Bits.getLoBits(64)) {
}		}
}		}

return SDValue();		return SDValue();
}		}

// Try 32-bit splatted SIMD immediate.		// Try 32-bit splatted SIMD immediate.
static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,		static SDValue tryAdvSIMDModImm32(unsigned NewOp, SDValue Op, SelectionDAG &DAG,
const APInt &Bits,		const APInt &Bits,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const APInt &Bits, - const SDValue LHS) { + const APInt &Bits, const SDValue LHS) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - const APInt &Bits…
const SDValue *LHS = nullptr) {		const SDValue *LHS) {
if (Bits.getHiBits(64) == Bits.getLoBits(64)) {		if (Bits.getHiBits(64) == Bits.getLoBits(64)) {
uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();		uint64_t Value = Bits.zextOrTrunc(64).getZExtValue();
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v4i32 : MVT::v2i32;		MVT MovTy = (VT.getSizeInBits() == 128) ? MVT::v4i32 : MVT::v2i32;
bool isAdvSIMDModImm = false;		bool isAdvSIMDModImm = false;
uint64_t Shift;		uint64_t Shift;

if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType1(Value))) {		if ((isAdvSIMDModImm = AArch64_AM::isAdvSIMDModImmType1(Value))) {
▲ Show 20 Lines • Show All 10,186 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/remat-const-float-simd.ll

This file was added.

				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -verify-machineinstrs \| FileCheck %s

				dmgreenUnsubmitted Not Done Reply Inline Actions This test file should be in CodeGen/AArch64 dmgreen: This test file should be in CodeGen/AArch64
				; Check that float(0x7fffffff) can be rematerialized with simd instruction
				fhahnUnsubmitted Not Done Reply Inline Actions It looks like the test is completely unrelated to the loop-vectorize pass. It should probably be either moved to `CodeGen/AArch64/` or added to one of the existing tests there. fhahn: It looks like the test is completely unrelated to the loop-vectorize pass. It should probably…
				target triple = "aarch64-unknown-linux-gnu"

				; float foo(void) { return float(0x7fffffff); }
				define float @foo() {
				; CHECK: movi v0.2s, #79, lsl #24
				entry:
				ret float 0x41E0000000000000
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use simd mov to materialize big fp constants
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 410998

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/Transforms/LoopVectorize/AArch64/remat-const-float-simd.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use simd mov to materialize big fp constantsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 410998

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/Transforms/LoopVectorize/AArch64/remat-const-float-simd.ll

[AArch64] Use simd mov to materialize big fp constants
ClosedPublic