This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add FCMLA AArch64ISD node.
Needs ReviewPublic

Authored by fhahn on Nov 12 2020, 6:27 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samparker
paquette
t.p.northover

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 126806
Build 184149: arc lint + arc unit

Event Timeline

fhahn created this revision.Nov 12 2020, 6:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2020, 6:27 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

fhahn requested review of this revision.Nov 12 2020, 6:27 AM

Harbormaster completed remote builds in B78611: Diff 304816.Nov 12 2020, 6:32 AM

Sounds OK, but I think there's such a thing as splitting up a patch too much! And if it's not possible to add tests for something, that can be a bad sign.

Complex MLA is something that exists in MVE too. I'm not sure what the other part of this looks like yet (I presume it's just matching), but it may be good in the long run to make some of this more target independent, so long as they work in the same way.

In D91346#2391341, @dmgreen wrote:

Sounds OK, but I think there's such a thing as splitting up a patch too much! And if it's not possible to add tests for something, that can be a bad sign.

That's a good point! The patch that's using it is D91354, but it relies on a new intrinsic which is still up in the air. I post on llvm-dev to restart the discussion about how to improve support for complex math: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146568.html

Complex MLA is something that exists in MVE too. I'm not sure what the other part of this looks like yet (I presume it's just matching), but it may be good in the long run to make some of this more target independent, so long as they work in the same way.

If there are very similar instructions in other backends, it might make sense to introduce an independent ISD node in the future, but I am only familiar with the complex math instructions on AArch64 unfortunately.

jdoerfert added a subscriber: jdoerfert.Nov 12 2020, 9:35 AM

huntergr added a subscriber: huntergr.Nov 13 2020, 2:34 AM

OK sure. I was expecting some ISel lowering, to be honest. And perhaps for a vplan patch to appear :)

At a concrete level, matching in CodeGenPrep doesn't sound ideal, unless we expect these to spill over multiple basic blocks a lot of the time. At the moment we could get the same effect by matching in ISel, like any other instruction.

If like I suspect (/hope :)) the goal is to pattern match during vectorization and produce something better there - relying on a "complex multply" intrinsic may not be optimal in terms of the patterns you can recognize. The VCMLA and VCMUL/VCADD operations are more general than that and can match other patterns. Things like conjugates and rotates can modify that. I can see how that would be harder to make look target independent though.

rebased, still WIP.

Harbormaster completed remote builds in B126806: Diff 376840.Oct 4 2021, 3:07 AM

fhahn added a child revision: D91354: [AArch64] Lower @llvm.complex.multiply using fcmla (WIP)..Oct 4 2021, 3:15 AM

pengfei added a subscriber: pengfei.Oct 4 2021, 5:39 AM

Matt added a subscriber: Matt.Oct 5 2021, 12:46 PM

Doesn't make much sense to me having this on it's own... merging with D91354 seems like the sensible choice.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

1 line

AArch64InstrInfo.td

8 lines

Diff 376840

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CMLEz,		CMLEz,
CMLTz,		CMLTz,
FCMEQz,		FCMEQz,
FCMGEz,		FCMGEz,
FCMGTz,		FCMGTz,
FCMLEz,		FCMLEz,
FCMLTz,		FCMLTz,

		FCMLA,

// Vector across-lanes addition		// Vector across-lanes addition
// Only the lower result lane is defined.		// Only the lower result lane is defined.
SADDV,		SADDV,
UADDV,		UADDV,

// Vector halving addition		// Vector halving addition
SHADD,		SHADD,
UHADD,		UHADD,
▲ Show 20 Lines • Show All 904 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,969 Lines • ▼ Show 20 Lines	case AArch64ISD::FIRST_NUMBER:
MAKE_CASE(AArch64ISD::SBCS)		MAKE_CASE(AArch64ISD::SBCS)
MAKE_CASE(AArch64ISD::ANDS)		MAKE_CASE(AArch64ISD::ANDS)
MAKE_CASE(AArch64ISD::CCMP)		MAKE_CASE(AArch64ISD::CCMP)
MAKE_CASE(AArch64ISD::CCMN)		MAKE_CASE(AArch64ISD::CCMN)
MAKE_CASE(AArch64ISD::FCCMP)		MAKE_CASE(AArch64ISD::FCCMP)
MAKE_CASE(AArch64ISD::FCMP)		MAKE_CASE(AArch64ISD::FCMP)
MAKE_CASE(AArch64ISD::STRICT_FCMP)		MAKE_CASE(AArch64ISD::STRICT_FCMP)
MAKE_CASE(AArch64ISD::STRICT_FCMPE)		MAKE_CASE(AArch64ISD::STRICT_FCMPE)
		MAKE_CASE(AArch64ISD::FCMLA)
MAKE_CASE(AArch64ISD::DUP)		MAKE_CASE(AArch64ISD::DUP)
MAKE_CASE(AArch64ISD::DUPLANE8)		MAKE_CASE(AArch64ISD::DUPLANE8)
MAKE_CASE(AArch64ISD::DUPLANE16)		MAKE_CASE(AArch64ISD::DUPLANE16)
MAKE_CASE(AArch64ISD::DUPLANE32)		MAKE_CASE(AArch64ISD::DUPLANE32)
MAKE_CASE(AArch64ISD::DUPLANE64)		MAKE_CASE(AArch64ISD::DUPLANE64)
MAKE_CASE(AArch64ISD::MOVI)		MAKE_CASE(AArch64ISD::MOVI)
MAKE_CASE(AArch64ISD::MOVIshift)		MAKE_CASE(AArch64ISD::MOVIshift)
MAKE_CASE(AArch64ISD::MOVIedit)		MAKE_CASE(AArch64ISD::MOVIedit)
▲ Show 20 Lines • Show All 17,021 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 259 Lines • ▼ Show 20 Lines
	def SDT_AArch64Insr : SDTypeProfile<1, 2, [SDTCisVec<0>]>;			def SDT_AArch64Insr : SDTypeProfile<1, 2, [SDTCisVec<0>]>;
	def SDT_AArch64Zip : SDTypeProfile<1, 2, [SDTCisVec<0>,			def SDT_AArch64Zip : SDTypeProfile<1, 2, [SDTCisVec<0>,
	SDTCisSameAs<0, 1>,			SDTCisSameAs<0, 1>,
	SDTCisSameAs<0, 2>]>;			SDTCisSameAs<0, 2>]>;
	def SDT_AArch64MOVIedit : SDTypeProfile<1, 1, [SDTCisInt<1>]>;			def SDT_AArch64MOVIedit : SDTypeProfile<1, 1, [SDTCisInt<1>]>;
	def SDT_AArch64MOVIshift : SDTypeProfile<1, 2, [SDTCisInt<1>, SDTCisInt<2>]>;			def SDT_AArch64MOVIshift : SDTypeProfile<1, 2, [SDTCisInt<1>, SDTCisInt<2>]>;
	def SDT_AArch64vecimm : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64vecimm : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisInt<2>, SDTCisInt<3>]>;			SDTCisInt<2>, SDTCisInt<3>]>;
				def SDT_AArch64vecfcmla : SDTypeProfile<1, 4, [SDTCisVec<0>, SDTCisSameAs<0,1>,
				SDTCisSameAs<1,2>,
				SDTCisSameAs<2,3>,
				SDTCisInt<4>]>;
	def SDT_AArch64UnaryVec: SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;			def SDT_AArch64UnaryVec: SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
	def SDT_AArch64ExtVec: SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64ExtVec: SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisSameAs<0,2>, SDTCisInt<3>]>;			SDTCisSameAs<0,2>, SDTCisInt<3>]>;
	def SDT_AArch64vshift : SDTypeProfile<1, 2, [SDTCisSameAs<0,1>, SDTCisInt<2>]>;			def SDT_AArch64vshift : SDTypeProfile<1, 2, [SDTCisSameAs<0,1>, SDTCisInt<2>]>;
	def SDT_AArch64Dot: SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64Dot: SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisVec<2>, SDTCisSameAs<2,3>]>;			SDTCisVec<2>, SDTCisSameAs<2,3>]>;

	def SDT_AArch64vshiftinsert : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisInt<3>,			def SDT_AArch64vshiftinsert : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisInt<3>,
	▲ Show 20 Lines • Show All 774 Lines • ▼ Show 20 Lines
	let Predicates = [HasRCPC] in {			let Predicates = [HasRCPC] in {
	// v8.3 Release Consistent Processor Consistent support, optional in v8.2.			// v8.3 Release Consistent Processor Consistent support, optional in v8.2.
	def LDAPRB : RCPCLoad<0b00, "ldaprb", GPR32>;			def LDAPRB : RCPCLoad<0b00, "ldaprb", GPR32>;
	def LDAPRH : RCPCLoad<0b01, "ldaprh", GPR32>;			def LDAPRH : RCPCLoad<0b01, "ldaprh", GPR32>;
	def LDAPRW : RCPCLoad<0b10, "ldapr", GPR32>;			def LDAPRW : RCPCLoad<0b10, "ldapr", GPR32>;
	def LDAPRX : RCPCLoad<0b11, "ldapr", GPR64>;			def LDAPRX : RCPCLoad<0b11, "ldapr", GPR64>;
	}			}

				def AArch64fcmla : SDNode<"AArch64ISD::FCMLA", SDT_AArch64vecfcmla>;

	// v8.3a complex add and multiply-accumulate. No predicate here, that is done			// v8.3a complex add and multiply-accumulate. No predicate here, that is done
	// inside the multiclass as the FP16 versions need different predicates.			// inside the multiclass as the FP16 versions need different predicates.
	defm FCMLA : SIMDThreeSameVectorTiedComplexHSD<1, 0b110, complexrotateop,			defm FCMLA : SIMDThreeSameVectorTiedComplexHSD<1, 0b110, complexrotateop,
	"fcmla", null_frag>;			"fcmla", AArch64fcmla>;
	defm FCADD : SIMDThreeSameVectorComplexHSD<1, 0b111, complexrotateopodd,			defm FCADD : SIMDThreeSameVectorComplexHSD<1, 0b111, complexrotateopodd,
	"fcadd", null_frag>;			"fcadd", null_frag>;
	defm FCMLA : SIMDIndexedTiedComplexHSD<0, 1, complexrotateop, "fcmla">;			defm FCMLA : SIMDIndexedTiedComplexHSD<0, 1, complexrotateop, "fcmla">;

	let Predicates = [HasComplxNum, HasNEON, HasFullFP16] in {			let Predicates = [HasComplxNum, HasNEON, HasFullFP16] in {
	def : Pat<(v4f16 (int_aarch64_neon_vcadd_rot90 (v4f16 V64:$Rn), (v4f16 V64:$Rm))),			def : Pat<(v4f16 (int_aarch64_neon_vcadd_rot90 (v4f16 V64:$Rn), (v4f16 V64:$Rm))),
	(FCADDv4f16 (v4f16 V64:$Rn), (v4f16 V64:$Rm), (i32 0))>;			(FCADDv4f16 (v4f16 V64:$Rn), (v4f16 V64:$Rm), (i32 0))>;
	def : Pat<(v4f16 (int_aarch64_neon_vcadd_rot270 (v4f16 V64:$Rn), (v4f16 V64:$Rm))),			def : Pat<(v4f16 (int_aarch64_neon_vcadd_rot270 (v4f16 V64:$Rn), (v4f16 V64:$Rm))),
	▲ Show 20 Lines • Show All 7,107 Lines • Show Last 20 Lines