This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3/4
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
aarch64-smov-gen.ll

Differential D108633

[AArch64] Generate smov in place of sext(fmov(...))
ClosedPublic

Authored by NickGuy on Aug 24 2021, 7:47 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
samtebbs
paulwalker-arm

Commits

rG36fcf47fc80d: [AArch64] Generate SMOV in place of sext(fmov(...))

Summary

A single smov instruction is capable of moving from a vector register while performing the sign-extend during said move, rather than each step being performed by separate instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NickGuy created this revision.Aug 24 2021, 7:47 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptAug 24 2021, 7:47 AM

NickGuy requested review of this revision.Aug 24 2021, 7:47 AM

Harbormaster completed remote builds in B120962: Diff 368344.Aug 24 2021, 8:27 AM

paulwalker-arm added inline comments.Aug 24 2021, 9:19 AM

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2496–2499	These should be using VectorIndexB.
2501–2504	These should only use VectorIndexH.
2506–2507	This should be using VectorIndexS.
llvm/test/CodeGen/AArch64/aarch64-smov-gen.ll
5–17	Please simplify the tests. For example target triple = "aarch64-unknown-linux-gnu" define i32 @extract_s8(<vscale x 16 x i8> %a) #0 { %elt = extractelement <vscale x 16 x i8> %a, i32 15 %conv = sext i8 %elt to i32 ret i32 %conv } attributes #0 = { "target-features"="+sve" } Should be enough to test the new patterns. Given the VectorIndex# issues above I think it's worth having tests for out-of-range indices as well. I guess testing `extract element VF-1` and `extract element VF` will cover the good and less good cases.

Addressed comments, adding additional test cases that cover cases with out-of-range indices.

Harbormaster completed remote builds in B121136: Diff 368597.Aug 25 2021, 3:57 AM

The out-of-range tests show that with a few more patterns we can do better, but given they're not the common case I guess they can wait.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
2496–2507	Can you move these patterns up a couple of blocks to be just after the `UMOV` variants as that's the block they relate to.

This revision is now accepted and ready to land.Aug 25 2021, 5:28 AM

Closed by commit rG36fcf47fc80d: [AArch64] Generate SMOV in place of sext(fmov(...)) (authored by NickGuy). · Explain WhyAug 25 2021, 7:24 AM

This revision was automatically updated to reflect the committed changes.

NickGuy added a commit: rG36fcf47fc80d: [AArch64] Generate SMOV in place of sext(fmov(...)).

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

13 lines

test/

CodeGen/

AArch64/

aarch64-smov-gen.ll

85 lines

Diff 368344

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 2,486 Lines • ▼ Show 20 Lines	let Predicates = [HasSVEorStreamingSVE] in {
def : Pat<(nxv16i8 (vector_splice (nxv16i8 ZPR:$Z1), (nxv16i8 ZPR:$Z2), (i64 (sve_ext_imm_0_15 i32:$index)))),		def : Pat<(nxv16i8 (vector_splice (nxv16i8 ZPR:$Z1), (nxv16i8 ZPR:$Z2), (i64 (sve_ext_imm_0_15 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_15:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_15:$index)>;
def : Pat<(nxv8i16 (vector_splice (nxv8i16 ZPR:$Z1), (nxv8i16 ZPR:$Z2), (i64 (sve_ext_imm_0_7 i32:$index)))),		def : Pat<(nxv8i16 (vector_splice (nxv8i16 ZPR:$Z1), (nxv8i16 ZPR:$Z2), (i64 (sve_ext_imm_0_7 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_7:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_7:$index)>;
def : Pat<(nxv4i32 (vector_splice (nxv4i32 ZPR:$Z1), (nxv4i32 ZPR:$Z2), (i64 (sve_ext_imm_0_3 i32:$index)))),		def : Pat<(nxv4i32 (vector_splice (nxv4i32 ZPR:$Z1), (nxv4i32 ZPR:$Z2), (i64 (sve_ext_imm_0_3 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_3:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_3:$index)>;
def : Pat<(nxv2i64 (vector_splice (nxv2i64 ZPR:$Z1), (nxv2i64 ZPR:$Z2), (i64 (sve_ext_imm_0_1 i32:$index)))),		def : Pat<(nxv2i64 (vector_splice (nxv2i64 ZPR:$Z1), (nxv2i64 ZPR:$Z2), (i64 (sve_ext_imm_0_1 i32:$index)))),
(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_1:$index)>;		(EXT_ZZI ZPR:$Z1, ZPR:$Z2, sve_ext_imm_0_1:$index)>;

		def : Pat<(sext_inreg (vector_extract (nxv16i8 ZPR:$vec), VectorIndexH:$index), i8),
		(i32 (SMOVvi8to32 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
		def : Pat<(sext_inreg (anyext (vector_extract (nxv16i8 ZPR:$vec), VectorIndexH:$index)), i8),
		(i64 (SMOVvi8to64 (v16i8 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
		paulwalker-armUnsubmitted Done Reply Inline Actions These should be using VectorIndexB. paulwalker-arm: These should be using VectorIndexB.

		def : Pat<(sext_inreg (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index), i16),
		(i32 (SMOVvi16to32 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
		def : Pat<(sext_inreg (anyext (vector_extract (nxv8i16 ZPR:$vec), VectorIndexH:$index)), i16),
		(i64 (SMOVvi16to64 (v8i16 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
		paulwalker-armUnsubmitted Done Reply Inline Actions These should only use VectorIndexH. paulwalker-arm: These should only use VectorIndexH.

		def : Pat<(sext (vector_extract (nxv4i32 ZPR:$vec), VectorIndexH:$index)),
		(i64 (SMOVvi32to64 (v4i32 (EXTRACT_SUBREG ZPR:$vec, zsub)), VectorIndexB:$index))>;
		paulwalker-armUnsubmitted Done Reply Inline Actions This should be using VectorIndexS. paulwalker-arm: This should be using VectorIndexS.
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Can you move these patterns up a couple of blocks to be just after the `UMOV` variants as that's the block they relate to. paulwalker-arm: Can you move these patterns up a couple of blocks to be just after the `UMOV` variants as…
} // End HasSVEorStreamingSVE		} // End HasSVEorStreamingSVE

let Predicates = [HasSVE, HasMatMulInt8] in {		let Predicates = [HasSVE, HasMatMulInt8] in {
defm SMMLA_ZZZ : sve_int_matmul<0b00, "smmla", int_aarch64_sve_smmla>;		defm SMMLA_ZZZ : sve_int_matmul<0b00, "smmla", int_aarch64_sve_smmla>;
defm UMMLA_ZZZ : sve_int_matmul<0b11, "ummla", int_aarch64_sve_ummla>;		defm UMMLA_ZZZ : sve_int_matmul<0b11, "ummla", int_aarch64_sve_ummla>;
defm USMMLA_ZZZ : sve_int_matmul<0b10, "usmmla", int_aarch64_sve_usmmla>;		defm USMMLA_ZZZ : sve_int_matmul<0b10, "usmmla", int_aarch64_sve_usmmla>;
} // End HasSVE, HasMatMulInt8		} // End HasSVE, HasMatMulInt8

▲ Show 20 Lines • Show All 468 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-smov-gen.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc --mtriple=aarch64-arm-none-eabi --march=aarch64 --mattr=+sve < %s \| FileCheck %s


				define dso_local i32 @_Z7foo8_32u10__SVInt8_t(<vscale x 16 x i8> %a) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z7foo8_32u10__SVInt8_t:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: sminv b0, p0, z0.b
				; CHECK-NEXT: smov w0, v0.b[0]
				; CHECK-NEXT: ret
				entry:
				%0 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%1 = tail call i8 @llvm.aarch64.sve.sminv.nxv16i8(<vscale x 16 x i1> %0, <vscale x 16 x i8> %a)
				%conv = sext i8 %1 to i32
				ret i32 %conv
				}
				paulwalker-armUnsubmitted Done Reply Inline Actions Please simplify the tests. For example target triple = "aarch64-unknown-linux-gnu" define i32 @extract_s8(<vscale x 16 x i8> %a) #0 { %elt = extractelement <vscale x 16 x i8> %a, i32 15 %conv = sext i8 %elt to i32 ret i32 %conv } attributes #0 = { "target-features"="+sve" } Should be enough to test the new patterns. Given the VectorIndex# issues above I think it's worth having tests for out-of-range indices as well. I guess testing `extract element VF-1` and `extract element VF` will cover the good and less good cases. paulwalker-arm: Please simplify the tests. For example ``` target triple = "aarch64-unknown-linux-gnu" define…

				declare <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 immarg) #1

				declare i8 @llvm.aarch64.sve.sminv.nxv16i8(<vscale x 16 x i1>, <vscale x 16 x i8>) #1

				define dso_local i64 @_Z7foo8_64u10__SVInt8_t(<vscale x 16 x i8> %a) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z7foo8_64u10__SVInt8_t:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.b
				; CHECK-NEXT: sminv b0, p0, z0.b
				; CHECK-NEXT: smov x0, v0.b[0]
				; CHECK-NEXT: ret
				entry:
				%0 = tail call <vscale x 16 x i1> @llvm.aarch64.sve.ptrue.nxv16i1(i32 31)
				%1 = tail call i8 @llvm.aarch64.sve.sminv.nxv16i8(<vscale x 16 x i1> %0, <vscale x 16 x i8> %a)
				%conv = sext i8 %1 to i64
				ret i64 %conv
				}

				define dso_local i32 @_Z8foo16_32u11__SVInt16_t(<vscale x 8 x i16> %a) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z8foo16_32u11__SVInt16_t:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: sminv h0, p0, z0.h
				; CHECK-NEXT: smov w0, v0.h[0]
				; CHECK-NEXT: ret
				entry:
				%0 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%1 = tail call i16 @llvm.aarch64.sve.sminv.nxv8i16(<vscale x 8 x i1> %0, <vscale x 8 x i16> %a)
				%conv = sext i16 %1 to i32
				ret i32 %conv
				}

				declare <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 immarg) #1

				declare i16 @llvm.aarch64.sve.sminv.nxv8i16(<vscale x 8 x i1>, <vscale x 8 x i16>) #1

				define dso_local i64 @_Z8foo16_64u11__SVInt16_t(<vscale x 8 x i16> %a) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z8foo16_64u11__SVInt16_t:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.h
				; CHECK-NEXT: sminv h0, p0, z0.h
				; CHECK-NEXT: smov x0, v0.h[0]
				; CHECK-NEXT: ret
				entry:
				%0 = tail call <vscale x 8 x i1> @llvm.aarch64.sve.ptrue.nxv8i1(i32 31)
				%1 = tail call i16 @llvm.aarch64.sve.sminv.nxv8i16(<vscale x 8 x i1> %0, <vscale x 8 x i16> %a)
				%conv = sext i16 %1 to i64
				ret i64 %conv
				}

				define dso_local i64 @_Z8foo32_64u11__SVInt32_t(<vscale x 4 x i32> %a) local_unnamed_addr #0 {
				; CHECK-LABEL: _Z8foo32_64u11__SVInt32_t:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ptrue p0.s
				; CHECK-NEXT: sminv s0, p0, z0.s
				; CHECK-NEXT: smov x0, v0.s[0]
				; CHECK-NEXT: ret
				entry:
				%0 = tail call <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 31)
				%1 = tail call i32 @llvm.aarch64.sve.sminv.nxv4i32(<vscale x 4 x i1> %0, <vscale x 4 x i32> %a)
				%conv = sext i32 %1 to i64
				ret i64 %conv
				}

				declare <vscale x 4 x i1> @llvm.aarch64.sve.ptrue.nxv4i1(i32 immarg) #1

				declare i32 @llvm.aarch64.sve.sminv.nxv4i32(<vscale x 4 x i1>, <vscale x 4 x i32>) #1