This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/2
AArch64TargetTransformInfo.cpp
-
test/Transforms/InstCombine/AArch64/
-
Transforms/
-
InstCombine/
-
AArch64/
-
sve-intrinsic-muladdsub.ll

Differential D144413

[InstCombine] Extend SVEVectorFuseMulAddSub to support newly added "undef" intrinsics.
ClosedPublic

Authored by paulwalker-arm on Feb 20 2023, 10:23 AM.

Download Raw Diff

Details

Reviewers

MattDevereau
kmclaughlin
sdesmalen

Commits

rG2f887c9a760d: [InstCombine] Extend SVEVectorFuseMulAddSub to support newly added "undef"…

Summary

D143767 will change the intrinsics used to lower floating-point
svadd_x, svmul_x and svsub_x builtins. This will result in the
combines added as part of D140200 to no longer fire in all cases.
This patch extends the existing combines for contraction to cover
fadd_u, fmul_u and fsub_u intrinsics.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

paulwalker-arm created this revision.Feb 20 2023, 10:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2023, 10:23 AM

Herald added subscribers: hiraditya, tschuett. · View Herald Transcript

paulwalker-arm requested review of this revision.Feb 20 2023, 10:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2023, 10:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

paulwalker-arm edited the summary of this revision. (Show Details)Feb 20 2023, 10:24 AM

There's likely other combines that will also need to be updated but this is the most important one blocking D143767. I'd like to take a more holistic look at the others as part of work to unify some code paths. For example, I want to canonicalise sve intrinsic calls where the predicate is all active to use the "undef" intrinsics so that some code duplication within the code generator can be removed. With that said, please shout if you know of something critical that should be handled before D143767 lands.

Matt added a subscriber: Matt.Feb 20 2023, 1:21 PM

ping

MattDevereau added inline comments.Mar 9 2023, 11:14 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1622–1625	Is this case separate from `instCombineSVEVectorAdd` because the "undef" variant can't combine to fmla_u unless both the fmul and fadd are of the _u variants? Or because this case can't benefit from combines in `instCombineSVEVectorBinOp`?

paulwalker-arm added inline comments.Mar 10 2023, 3:04 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1622–1625	The formerish. `fadd_m(pg, a, fmul_u(pg, b, c))` expects the inactive elements to come from `a`, which `fmla_u` does not guarantee. It's worth pointing out the opposite is a valid transformation (i.e. `fadd_u(pg, a, fmul_m(pg, b, c)) --> fmla_u(a, b, c)` but that's new and I have half a thought it'll be better to soften the `fmul_m` to `fmul_u` rather than jumping straight to `fmla_u`. This does mean we're not getting the benefit of `instCombineSVEVectorBinOp` but here my plan is to rewrite `m` instrinsics that take an all active predicate to their equivalent `u` intrinsic, to minimise duplication.

MattDevereau accepted this revision.Mar 10 2023, 3:15 AM

This revision is now accepted and ready to land.Mar 10 2023, 3:15 AM

This revision was landed with ongoing or failed builds.Mar 12 2023, 4:31 AM

Closed by commit rG2f887c9a760d: [InstCombine] Extend SVEVectorFuseMulAddSub to support newly added "undef"… (authored by paulwalker-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

paulwalker-arm added a commit: rG2f887c9a760d: [InstCombine] Extend SVEVectorFuseMulAddSub to support newly added "undef"….

jolanta.jensen mentioned this in D152005: [SVE ACLE] Implement IR combines to convert intrinsics used for _m C/C++ builtins.Jun 19 2023, 9:27 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

8 lines

test/

Transforms/

InstCombine/

AArch64/

sve-intrinsic-muladdsub.ll

27 lines

Diff 504420

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,613 Lines • ▼ Show 20 Lines	AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
case Intrinsic::aarch64_sve_ptest_last:		case Intrinsic::aarch64_sve_ptest_last:
return instCombineSVEPTest(IC, II);		return instCombineSVEPTest(IC, II);
case Intrinsic::aarch64_sve_mul:		case Intrinsic::aarch64_sve_mul:
case Intrinsic::aarch64_sve_fmul:		case Intrinsic::aarch64_sve_fmul:
return instCombineSVEVectorMul(IC, II);		return instCombineSVEVectorMul(IC, II);
case Intrinsic::aarch64_sve_fadd:		case Intrinsic::aarch64_sve_fadd:
case Intrinsic::aarch64_sve_add:		case Intrinsic::aarch64_sve_add:
return instCombineSVEVectorAdd(IC, II);		return instCombineSVEVectorAdd(IC, II);
		case Intrinsic::aarch64_sve_fadd_u:
		return instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul_u,
		Intrinsic::aarch64_sve_fmla_u>(
		IC, II, true);
		MattDevereauUnsubmitted Not Done Reply Inline Actions Is this case separate from `instCombineSVEVectorAdd` because the "undef" variant can't combine to fmla_u unless both the fmul and fadd are of the _u variants? Or because this case can't benefit from combines in `instCombineSVEVectorBinOp`? MattDevereau: Is this case separate from `instCombineSVEVectorAdd` because the "undef" variant can't combine…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions The formerish. `fadd_m(pg, a, fmul_u(pg, b, c))` expects the inactive elements to come from `a`, which `fmla_u` does not guarantee. It's worth pointing out the opposite is a valid transformation (i.e. `fadd_u(pg, a, fmul_m(pg, b, c)) --> fmla_u(a, b, c)` but that's new and I have half a thought it'll be better to soften the `fmul_m` to `fmul_u` rather than jumping straight to `fmla_u`. This does mean we're not getting the benefit of `instCombineSVEVectorBinOp` but here my plan is to rewrite `m` instrinsics that take an all active predicate to their equivalent `u` intrinsic, to minimise duplication. paulwalker-arm: The formerish. `fadd_m(pg, a, fmul_u(pg, b, c))` expects the inactive elements to come from `a`…
case Intrinsic::aarch64_sve_fsub:		case Intrinsic::aarch64_sve_fsub:
case Intrinsic::aarch64_sve_sub:		case Intrinsic::aarch64_sve_sub:
return instCombineSVEVectorSub(IC, II);		return instCombineSVEVectorSub(IC, II);
		case Intrinsic::aarch64_sve_fsub_u:
		return instCombineSVEVectorFuseMulAddSub<Intrinsic::aarch64_sve_fmul_u,
		Intrinsic::aarch64_sve_fmls_u>(
		IC, II, true);
case Intrinsic::aarch64_sve_tbl:		case Intrinsic::aarch64_sve_tbl:
return instCombineSVETBL(IC, II);		return instCombineSVETBL(IC, II);
case Intrinsic::aarch64_sve_uunpkhi:		case Intrinsic::aarch64_sve_uunpkhi:
case Intrinsic::aarch64_sve_uunpklo:		case Intrinsic::aarch64_sve_uunpklo:
case Intrinsic::aarch64_sve_sunpkhi:		case Intrinsic::aarch64_sve_sunpkhi:
case Intrinsic::aarch64_sve_sunpklo:		case Intrinsic::aarch64_sve_sunpklo:
return instCombineSVEUnpack(IC, II);		return instCombineSVEUnpack(IC, II);
case Intrinsic::aarch64_sve_zip1:		case Intrinsic::aarch64_sve_zip1:
▲ Show 20 Lines • Show All 1,781 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-muladdsub.ll

	Show All 9 Lines
	; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]			; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
	;			;
	%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)			%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)
	%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)			%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
	%3 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fadd.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %c, <vscale x 8 x half> %2)			%3 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fadd.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %c, <vscale x 8 x half> %2)
	ret <vscale x 8 x half> %3			ret <vscale x 8 x half> %3
	}			}

				define <vscale x 8 x half> @combine_fmla_u(<vscale x 16 x i1> %p, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) #0 {
				; CHECK-LABEL: @combine_fmla_u(
				; CHECK-NEXT: [[TMP1:%.]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[P:%.]])
				; CHECK-NEXT: [[TMP2:%.]] = call fast <vscale x 8 x half> @llvm.aarch64.sve.fmla.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[C:%.]], <vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.u.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%3 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fadd.u.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %c, <vscale x 8 x half> %2)
				ret <vscale x 8 x half> %3
				}

	define <vscale x 16 x i8> @combine_mla_i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {			define <vscale x 16 x i8> @combine_mla_i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {
	; CHECK-LABEL: @combine_mla_i8(			; CHECK-LABEL: @combine_mla_i8(
	; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 16 x i8> @llvm.aarch64.sve.mla.nxv16i8(<vscale x 16 x i1> [[P:%.]], <vscale x 16 x i8> [[C:%.]], <vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.*]])			; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 16 x i8> @llvm.aarch64.sve.mla.nxv16i8(<vscale x 16 x i1> [[P:%.]], <vscale x 16 x i8> [[C:%.]], <vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.*]])
	; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP1]]			; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP1]]
	;			;
	%1 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)			%1 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
	%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %c, <vscale x 16 x i8> %1)			%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %c, <vscale x 16 x i8> %1)
	ret <vscale x 16 x i8> %2			ret <vscale x 16 x i8> %2
	Show All 28 Lines
	; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]			; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
	;			;
	%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)			%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)
	%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)			%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
	%3 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %c, <vscale x 8 x half> %2)			%3 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %c, <vscale x 8 x half> %2)
	ret <vscale x 8 x half> %3			ret <vscale x 8 x half> %3
	}			}

				define <vscale x 8 x half> @combine_fmls_u(<vscale x 16 x i1> %p, <vscale x 8 x half> %a, <vscale x 8 x half> %b, <vscale x 8 x half> %c) #0 {
				; CHECK-LABEL: @combine_fmls_u(
				; CHECK-NEXT: [[TMP1:%.]] = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> [[P:%.]])
				; CHECK-NEXT: [[TMP2:%.]] = call fast <vscale x 8 x half> @llvm.aarch64.sve.fmls.u.nxv8f16(<vscale x 8 x i1> [[TMP1]], <vscale x 8 x half> [[C:%.]], <vscale x 8 x half> [[A:%.]], <vscale x 8 x half> [[B:%.]])
				; CHECK-NEXT: ret <vscale x 8 x half> [[TMP2]]
				;
				%1 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %p)
				%2 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fmul.u.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %a, <vscale x 8 x half> %b)
				%3 = tail call fast <vscale x 8 x half> @llvm.aarch64.sve.fsub.u.nxv8f16(<vscale x 8 x i1> %1, <vscale x 8 x half> %c, <vscale x 8 x half> %2)
				ret <vscale x 8 x half> %3
				}

	define <vscale x 16 x i8> @combine_mls_i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {			define <vscale x 16 x i8> @combine_mls_i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b, <vscale x 16 x i8> %c) #0 {
	; CHECK-LABEL: @combine_mls_i8(			; CHECK-LABEL: @combine_mls_i8(
	; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 16 x i8> @llvm.aarch64.sve.mls.nxv16i8(<vscale x 16 x i1> [[P:%.]], <vscale x 16 x i8> [[C:%.]], <vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.*]])			; CHECK-NEXT: [[TMP1:%.]] = call <vscale x 16 x i8> @llvm.aarch64.sve.mls.nxv16i8(<vscale x 16 x i1> [[P:%.]], <vscale x 16 x i8> [[C:%.]], <vscale x 16 x i8> [[A:%.]], <vscale x 16 x i8> [[B:%.*]])
	; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP1]]			; CHECK-NEXT: ret <vscale x 16 x i8> [[TMP1]]
	;			;
	%1 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)			%1 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %a, <vscale x 16 x i8> %b)
	%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %c, <vscale x 16 x i8> %1)			%2 = tail call <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1> %p, <vscale x 16 x i8> %c, <vscale x 16 x i8> %1)
	ret <vscale x 16 x i8> %2			ret <vscale x 16 x i8> %2
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines

	declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32)			declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32)
	declare <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1>)			declare <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1>)
	declare <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1>)			declare <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv4i1(<vscale x 16 x i1>)
	declare <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1>)			declare <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv2i1(<vscale x 16 x i1>)
	declare <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)			declare <vscale x 8 x half> @llvm.aarch64.sve.fmul.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
	declare <vscale x 8 x half> @llvm.aarch64.sve.fadd.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)			declare <vscale x 8 x half> @llvm.aarch64.sve.fadd.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
	declare <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)			declare <vscale x 8 x half> @llvm.aarch64.sve.fsub.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fmul.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fadd.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
				declare <vscale x 8 x half> @llvm.aarch64.sve.fsub.u.nxv8f16(<vscale x 8 x i1>, <vscale x 8 x half>, <vscale x 8 x half>)
	declare <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.mul.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.add.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)
	declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.aarch64.sve.sub.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>, <vscale x 16 x i8>)

	attributes #0 = { "target-features"="+sve" }			attributes #0 = { "target-features"="+sve" }