This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
6/10
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
2/3
sve-intrinsics-dup-x.ll
1/1
sve-vector-splat.ll

Differential D96700

[llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals
ClosedPublic

Authored by DavidTruby on Feb 15 2021, 4:09 AM.

Download Raw Diff

Details

Reviewers

efriedma
paulwalker-arm
joechrisellis
peterwaller-arm
bsmith

Commits

rGe86f9ba15c41: [llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals

Summary

When a literal that cannot fit in the immediate form of the fmov instruction
is used to initialise an SVE vector, an extra unnecessary fmov is currently
generated. This patch adds an extra codegen pattern preventing the extra
instruction from being generated.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

DavidTruby created this revision.Feb 15 2021, 4:09 AM

Herald added a reviewer: efriedma. · View Herald TranscriptFeb 15 2021, 4:09 AM

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

DavidTruby requested review of this revision.Feb 15 2021, 4:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2021, 4:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

DavidTruby added reviewers: paulwalker-arm, joechrisellis, peterwaller-arm, bsmith.Feb 15 2021, 4:10 AM

georges added a subscriber: georges.Feb 15 2021, 4:15 AM

georges added inline comments.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
559	do we also need patterns for f16/f64?

Patch looks good to me. I'll leave it up to you whether you want to extend the patch to cover f16/f64 cases or defer until needed.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
559	I would say want rather than need given this is more optimisation than function.
llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll
133–139	Keep it if you want but this test "vscale x 2 x float" is kind of unnecessary because the intrinsics only expect to operate on/with fully packed vectors. The fact some work with unpacked types is really a quirk due to some reusing the ISD nodes used for stock LLVM IR.

This revision is now accepted and ready to land.Feb 15 2021, 4:46 AM

Harbormaster completed remote builds in B89209: Diff 323711.Feb 15 2021, 5:14 AM

Added equivalent f64 patterns

The f64 case is similar enough that adding those patterns here is fine; f16 seems to take
a different codepath and generate very different code, so I'd rather change that separately
when it comes up (if it proves necessary).

DavidTruby marked an inline comment as done.Feb 15 2021, 5:32 AM

DavidTruby added inline comments.Feb 15 2021, 5:38 AM

llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll
133–139	If it's not likely to not be permitted in future I'd probably rather leave it in on the basis that more tests is usually just better in my opinion. I can remove if it's something that shouldn't be allowed/might not be allowed in future though

LGTM, I'll let @paulwalker-arm accept.

Harbormaster completed remote builds in B89216: Diff 323722.Feb 15 2021, 5:42 AM

david-arm added a subscriber: david-arm.Feb 15 2021, 6:05 AM

david-arm added inline comments.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
560	It looks like this probably needs a `let AddedComplexity = 2` like the patterns below. Also, should we be also adding patterns for `half` and `double` too?

paulwalker-arm requested changes to this revision.Feb 15 2021, 6:10 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
566–567	`nxv4f64` is not a legal type and thus this pattern will have no affect.
llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll
157–164	Whilst I can begrudgingly accept the `vscale x 2 x float` test, this `vscale x 4 x double` test should definite be removed as it is not linked to any of the new patterns.
llvm/test/CodeGen/AArch64/sve-vector-splat.ll
404–405	I'd remove this test also, but if you keep it then the CHECK lines need updating as `<vscale x 4 x double>` sits across two registers. Personally I just remove it as it's not testing anything that is not already tested by splat_nxv2f64_fmov_fold.

This revision now requires changes to proceed.Feb 15 2021, 6:10 AM

paulwalker-arm added inline comments.Feb 15 2021, 6:18 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
560	@david-arm : What have you spotted to say this? The tests suggest AddedComplexity is not required?

david-arm added inline comments.Feb 15 2021, 6:21 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
560	Well, I guess I haven't seen specific tests that require this. I just thought this was consistent with the patterns below that's all, but maybe I've just misunderstood something!

DavidTruby added inline comments.Feb 15 2021, 6:29 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
560	Adding it seems to break a number of other unrelated tests so I think it's best to leave it out
566–567	My mistake I must have not been thinking very hard when I added these!

Remove faulty f64 patterns

DavidTruby marked 3 inline comments as done.Feb 15 2021, 6:34 AM

paulwalker-arm accepted this revision.Feb 15 2021, 6:41 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
562	Rouge intent.

This revision is now accepted and ready to land.Feb 15 2021, 6:41 AM

Harbormaster completed remote builds in B89221: Diff 323738.Feb 15 2021, 7:23 AM

Closed by commit rGe86f9ba15c41: [llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals (authored by DavidTruby). · Explain WhyFeb 16 2021, 6:17 AM

This revision was automatically updated to reflect the committed changes.

DavidTruby added a commit: rGe86f9ba15c41: [llvm][Aarch64][SVE] Remove extra fmov instruction with certain literals.

DavidTruby marked an inline comment as done.Feb 16 2021, 6:18 AM

DavidTruby added inline comments.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
562	fixed in committed version

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

10 lines

test/

CodeGen/

AArch64/

sve-intrinsics-dup-x.ll

35 lines

sve-vector-splat.ll

38 lines

Diff 323722

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 550 Lines • ▼ Show 20 Lines	def : Pat<(nxv16i8 (AArch64dup (i32 (SVE8BitLslImm i32:$a, i32:$b)))),
(DUP_ZI_B $a, $b)>;		(DUP_ZI_B $a, $b)>;
def : Pat<(nxv8i16 (AArch64dup (i32 (SVE8BitLslImm i32:$a, i32:$b)))),		def : Pat<(nxv8i16 (AArch64dup (i32 (SVE8BitLslImm i32:$a, i32:$b)))),
(DUP_ZI_H $a, $b)>;		(DUP_ZI_H $a, $b)>;
def : Pat<(nxv4i32 (AArch64dup (i32 (SVE8BitLslImm i32:$a, i32:$b)))),		def : Pat<(nxv4i32 (AArch64dup (i32 (SVE8BitLslImm i32:$a, i32:$b)))),
(DUP_ZI_S $a, $b)>;		(DUP_ZI_S $a, $b)>;
def : Pat<(nxv2i64 (AArch64dup (i64 (SVE8BitLslImm i32:$a, i32:$b)))),		def : Pat<(nxv2i64 (AArch64dup (i64 (SVE8BitLslImm i32:$a, i32:$b)))),
(DUP_ZI_D $a, $b)>;		(DUP_ZI_D $a, $b)>;

		// Duplicate immediate FP into all vector elements.
		georgesUnsubmitted Done Reply Inline Actions do we also need patterns for f16/f64? georges: do we also need patterns for f16/f64?
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I would say want rather than need given this is more optimisation than function. paulwalker-arm: I would say want rather than need given this is more optimisation than function.
		def : Pat<(nxv2f32 (AArch64dup (f32 fpimm:$val))),
		david-armUnsubmitted Not Done Reply Inline Actions It looks like this probably needs a `let AddedComplexity = 2` like the patterns below. Also, should we be also adding patterns for `half` and `double` too? david-arm: It looks like this probably needs a `let AddedComplexity = 2` like the patterns below. Also…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions @david-arm : What have you spotted to say this? The tests suggest AddedComplexity is not required? paulwalker-arm: @david-arm : What have you spotted to say this? The tests suggest AddedComplexity is not…
		david-armUnsubmitted Not Done Reply Inline Actions Well, I guess I haven't seen specific tests that require this. I just thought this was consistent with the patterns below that's all, but maybe I've just misunderstood something! david-arm: Well, I guess I haven't seen specific tests that require this. I just thought this was…
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions Adding it seems to break a number of other unrelated tests so I think it's best to leave it out DavidTruby: Adding it seems to break a number of other unrelated tests so I think it's best to leave it out
		(DUP_ZR_S (MOVi32imm (bitcast_fpimm_to_i32 f32:$val)))>;
		def : Pat<(nxv4f32 (AArch64dup (f32 fpimm:$val))),
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Rouge intent. paulwalker-arm: Rouge intent.
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions fixed in committed version DavidTruby: fixed in committed version
		(DUP_ZR_S (MOVi32imm (bitcast_fpimm_to_i32 f32:$val)))>;
		def : Pat<(nxv2f64 (AArch64dup (f64 fpimm:$val))),
		(DUP_ZR_D (MOVi64imm (bitcast_fpimm_to_i64 f64:$val)))>;
		def : Pat<(nxv4f64 (AArch64dup (f64 fpimm:$val))),
		(DUP_ZR_D (MOVi64imm (bitcast_fpimm_to_i64 f64:$val)))>;
		paulwalker-armUnsubmitted Done Reply Inline Actions `nxv4f64` is not a legal type and thus this pattern will have no affect. paulwalker-arm: `nxv4f64` is not a legal type and thus this pattern will have no affect.
		DavidTrubyAuthorUnsubmitted Done Reply Inline Actions My mistake I must have not been thinking very hard when I added these! DavidTruby: My mistake I must have not been thinking very hard when I added these!

// Duplicate FP immediate into all vector elements		// Duplicate FP immediate into all vector elements
let AddedComplexity = 2 in {		let AddedComplexity = 2 in {
def : Pat<(nxv8f16 (AArch64dup fpimm16:$imm8)),		def : Pat<(nxv8f16 (AArch64dup fpimm16:$imm8)),
(FDUP_ZI_H fpimm16:$imm8)>;		(FDUP_ZI_H fpimm16:$imm8)>;
def : Pat<(nxv4f16 (AArch64dup fpimm16:$imm8)),		def : Pat<(nxv4f16 (AArch64dup fpimm16:$imm8)),
(FDUP_ZI_H fpimm16:$imm8)>;		(FDUP_ZI_H fpimm16:$imm8)>;
def : Pat<(nxv2f16 (AArch64dup fpimm16:$imm8)),		def : Pat<(nxv2f16 (AArch64dup fpimm16:$imm8)),
(FDUP_ZI_H fpimm16:$imm8)>;		(FDUP_ZI_H fpimm16:$imm8)>;
▲ Show 20 Lines • Show All 2,191 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-dup-x.ll

	Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	define <vscale x 2 x double> @dup_imm_f64(double %b) {			define <vscale x 2 x double> @dup_imm_f64(double %b) {
	; CHECK-LABEL: dup_imm_f64:			; CHECK-LABEL: dup_imm_f64:
	; CHECK: mov z0.d, #16.00000000			; CHECK: mov z0.d, #16.00000000
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%out = call <vscale x 2 x double> @llvm.aarch64.sve.dup.x.nxv2f64(double 16.)			%out = call <vscale x 2 x double> @llvm.aarch64.sve.dup.x.nxv2f64(double 16.)
	ret <vscale x 2 x double> %out			ret <vscale x 2 x double> %out
	}			}

				define <vscale x 2 x float> @dup_fmov_imm_f32_2() {
				; CHECK-LABEL: dup_fmov_imm_f32_2:
				; CHECK: mov w8, #1109917696
				; CHECK-NEXT: mov z0.s, w8
				%out = tail call <vscale x 2 x float> @llvm.aarch64.sve.dup.x.nxv2f32(float 4.200000e+01)
				ret <vscale x 2 x float> %out
				}
				paulwalker-armUnsubmitted Not Done Reply Inline Actions Keep it if you want but this test "vscale x 2 x float" is kind of unnecessary because the intrinsics only expect to operate on/with fully packed vectors. The fact some work with unpacked types is really a quirk due to some reusing the ISD nodes used for stock LLVM IR. paulwalker-arm: Keep it if you want but this test "vscale x 2 x float" is kind of unnecessary because the…
				DavidTrubyAuthorUnsubmitted Done Reply Inline Actions If it's not likely to not be permitted in future I'd probably rather leave it in on the basis that more tests is usually just better in my opinion. I can remove if it's something that shouldn't be allowed/might not be allowed in future though DavidTruby: If it's not likely to not be permitted in future I'd probably rather leave it in on the basis…

				define <vscale x 4 x float> @dup_fmov_imm_f32_4() {
				; CHECK-LABEL: dup_fmov_imm_f32_4:
				; CHECK: mov w8, #1109917696
				; CHECK-NEXT: mov z0.s, w8
				%out = tail call <vscale x 4 x float> @llvm.aarch64.sve.dup.x.nxv4f32(float 4.200000e+01)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 2 x double> @dup_fmov_imm_f64_2() {
				; CHECK-LABEL: dup_fmov_imm_f64_2:
				; CHECK: mov x8, #4631107791820423168
				; CHECK-NEXT: mov z0.d, x8
				%out = tail call <vscale x 2 x double> @llvm.aarch64.sve.dup.x.nxv2f64(double 4.200000e+01)
				ret <vscale x 2 x double> %out
				}

				define <vscale x 4 x double> @dup_fmov_imm_f64_4() {
				; CHECK-LABEL: dup_fmov_imm_f64_4:
				; CHECK: mov x8, #4631107791820423168
				; CHECK-NEXT: mov z0.d, x8
				%out = tail call <vscale x 4 x double> @llvm.aarch64.sve.dup.x.nxv4f64(double 4.200000e+01)
				ret <vscale x 4 x double> %out
				}

				paulwalker-armUnsubmitted Done Reply Inline Actions Whilst I can begrudgingly accept the `vscale x 2 x float` test, this `vscale x 4 x double` test should definite be removed as it is not linked to any of the new patterns. paulwalker-arm: Whilst I can begrudgingly accept the `vscale x 2 x float` test, this `vscale x 4 x double` test…

	declare <vscale x 16 x i8> @llvm.aarch64.sve.dup.x.nxv16i8( i8)			declare <vscale x 16 x i8> @llvm.aarch64.sve.dup.x.nxv16i8( i8)
	declare <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16)			declare <vscale x 8 x i16> @llvm.aarch64.sve.dup.x.nxv8i16(i16)
	declare <vscale x 4 x i32> @llvm.aarch64.sve.dup.x.nxv4i32(i32)			declare <vscale x 4 x i32> @llvm.aarch64.sve.dup.x.nxv4i32(i32)
	declare <vscale x 2 x i64> @llvm.aarch64.sve.dup.x.nxv2i64(i64)			declare <vscale x 2 x i64> @llvm.aarch64.sve.dup.x.nxv2i64(i64)
	declare <vscale x 8 x half> @llvm.aarch64.sve.dup.x.nxv8f16(half)			declare <vscale x 8 x half> @llvm.aarch64.sve.dup.x.nxv8f16(half)
	declare <vscale x 8 x bfloat> @llvm.aarch64.sve.dup.x.nxv8bf16(bfloat)			declare <vscale x 8 x bfloat> @llvm.aarch64.sve.dup.x.nxv8bf16(bfloat)
				declare <vscale x 2 x float> @llvm.aarch64.sve.dup.x.nxv2f32(float)
	declare <vscale x 4 x float> @llvm.aarch64.sve.dup.x.nxv4f32(float)			declare <vscale x 4 x float> @llvm.aarch64.sve.dup.x.nxv4f32(float)
	declare <vscale x 2 x double> @llvm.aarch64.sve.dup.x.nxv2f64(double)			declare <vscale x 2 x double> @llvm.aarch64.sve.dup.x.nxv2f64(double)
				declare <vscale x 4 x double> @llvm.aarch64.sve.dup.x.nxv4f64(double)

	; +bf16 is required for the bfloat version.			; +bf16 is required for the bfloat version.
	attributes #0 = { "target-features"="+sve,+bf16" }			attributes #0 = { "target-features"="+sve,+bf16" }

llvm/test/CodeGen/AArch64/sve-vector-splat.ll

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
	define <vscale x 4 x float> @splat_nxv4f32_fold(<vscale x 4 x float> %x) {			define <vscale x 4 x float> @splat_nxv4f32_fold(<vscale x 4 x float> %x) {
	; CHECK-LABEL: splat_nxv4f32_fold:			; CHECK-LABEL: splat_nxv4f32_fold:
	; CHECK: mov z0.s, #0			; CHECK: mov z0.s, #0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = fsub nnan <vscale x 4 x float> %x, %x			%r = fsub nnan <vscale x 4 x float> %x, %x
	ret <vscale x 4 x float> %r			ret <vscale x 4 x float> %r
	}			}

				define <vscale x 2 x float> @splat_nxv2f32_fmov_fold() {
				; CHECK-LABEL: splat_nxv2f32_fmov_fold
				; CHECK: mov w8, #1109917696
				; CHECK-NEXT: mov z0.s, w8
				%1 = insertelement <vscale x 2 x float> undef, float 4.200000e+01, i32 0
				%2 = shufflevector <vscale x 2 x float> %1, <vscale x 2 x float> undef, <vscale x 2 x i32> zeroinitializer
				ret <vscale x 2 x float> %2
				}

				define <vscale x 4 x float> @splat_nxv4f32_fmov_fold() {
				; CHECK-LABEL: splat_nxv4f32_fmov_fold
				; CHECK: mov w8, #1109917696
				; CHECK-NEXT: mov z0.s, w8
				%1 = insertelement <vscale x 4 x float> undef, float 4.200000e+01, i32 0
				%2 = shufflevector <vscale x 4 x float> %1, <vscale x 4 x float> undef, <vscale x 4 x i32> zeroinitializer
				ret <vscale x 4 x float> %2
				}

				define <vscale x 2 x double> @splat_nxv2f64_fmov_fold() {
				; CHECK-LABEL: splat_nxv2f64_fmov_fold
				; CHECK: mov x8, #4631107791820423168
				; CHECK-NEXT: mov z0.d, x8
				%1 = insertelement <vscale x 2 x double> undef, double 4.200000e+01, i32 0
				%2 = shufflevector <vscale x 2 x double> %1, <vscale x 2 x double> undef, <vscale x 2 x i32> zeroinitializer
				ret <vscale x 2 x double> %2
				}

				define <vscale x 4 x double> @splat_nxv4f64_fmov_fold() {
				; CHECK-LABEL: splat_nxv4f64_fmov_fold
				; CHECK: mov x8, #4631107791820423168
				; CHECK-NEXT: mov z0.d, x8
				paulwalker-armUnsubmitted Done Reply Inline Actions I'd remove this test also, but if you keep it then the CHECK lines need updating as `<vscale x 4 x double>` sits across two registers. Personally I just remove it as it's not testing anything that is not already tested by splat_nxv2f64_fmov_fold. paulwalker-arm: I'd remove this test also, but if you keep it then the CHECK lines need updating as `<vscale x…
				%1 = insertelement <vscale x 4 x double> undef, double 4.200000e+01, i32 0
				%2 = shufflevector <vscale x 4 x double> %1, <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer
				ret <vscale x 4 x double> %2
				}



	; +bf16 is required for the bfloat version.			; +bf16 is required for the bfloat version.
	attributes #0 = { "target-features"="+sve,+bf16" }			attributes #0 = { "target-features"="+sve,+bf16" }