This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/4
AArch64InstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
uaddlv-vaddlp-combine.ll

Differential D153323

[AArch64] Try to fold uaddlv and uaddlp
ClosedPublic

Authored by jaykang10 on Jun 20 2023, 2:27 AM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
t.p.northover

Commits

rGcce08185b4b5: [AArch64] Try to fold uaddlv and uaddlp

Summary

gcc generates less instructions than llvm from below intrinsic example.

#include <arm_neon.h>

unsigned foo(uint16x8_t b) {
    return vaddlvq_u32(vpadalq_u16(vdupq_n_u32(0), b));
}

gcc output

foo:
	uaddlv	s31, v0.8h
	fmov	x0, d31
	ret

llvm output

foo:
	uaddlp	v0.4s, v0.8h
	uaddlv	d0, v0.4s
	fmov	x0, d0

We could do uaddlv(uaddlp(x)) ==> uaddlv(x).
After adding tablegen pattern for it, the llvm output is as below.

foo:
	uaddlv	s0, v0.8h
	fmov	x0, d0

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jaykang10 created this revision.Jun 20 2023, 2:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2023, 2:27 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

jaykang10 requested review of this revision.Jun 20 2023, 2:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2023, 2:27 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dmgreen added inline comments.Jun 20 2023, 3:27 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
6335	It is probably quite a minor point, but can you change this to a `(v4i32 (SUBREG_TO_REG (i64 0), (UADDLVv8i16v V128:$op), ssub))`. The EXTRACT_SUBREG is using the fact that the higher lanes will be implicitly zeroed.

jaykang10 added inline comments.Jun 20 2023, 4:11 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
6335	Let me update the pattern.

Following @dmgreen's comment, updated the pattern.

Harbormaster completed remote builds in B239969: Diff 532864.Jun 20 2023, 6:03 AM

dmgreen added inline comments.Jun 20 2023, 6:23 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
6340	This one too, as it returns a h reg.

jaykang10 added inline comments.Jun 20 2023, 6:38 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
6340	Sorry. Let me update the pattern.

Following @dmgreen's comment, updated pattern.

Thanks. LGTM

This revision is now accepted and ready to land.Jun 20 2023, 6:42 AM

This revision was landed with ongoing or failed builds.Jun 20 2023, 7:15 AM

Closed by commit rGcce08185b4b5: [AArch64] Try to fold uaddlv and uaddlp (authored by jaykang10). · Explain Why

This revision was automatically updated to reflect the committed changes.

jaykang10 added a commit: rGcce08185b4b5: [AArch64] Try to fold uaddlv and uaddlp.

Harbormaster completed remote builds in B239990: Diff 532895.Jun 20 2023, 7:20 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.td

11 lines

test/

CodeGen/

AArch64/

uaddlv-vaddlp-combine.ll

32 lines

Diff 532914

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,323 Lines • ▼ Show 20 Lines	def : Pat<(v2i32 (AArch64uaddv (v2i32 (addlp (v4i16 V64:$op))))),
(INSERT_SUBREG (v2i32 (IMPLICIT_DEF)), (!cast<Instruction>(Opc#"v4i16v") V64:$op), ssub)>;		(INSERT_SUBREG (v2i32 (IMPLICIT_DEF)), (!cast<Instruction>(Opc#"v4i16v") V64:$op), ssub)>;
def : Pat<(v2i64 (AArch64uaddv (v2i64 (addlp (v4i32 V128:$op))))),		def : Pat<(v2i64 (AArch64uaddv (v2i64 (addlp (v4i32 V128:$op))))),
(INSERT_SUBREG (v2i64 (IMPLICIT_DEF)), (!cast<Instruction>(Opc#"v4i32v") V128:$op), dsub)>;		(INSERT_SUBREG (v2i64 (IMPLICIT_DEF)), (!cast<Instruction>(Opc#"v4i32v") V128:$op), dsub)>;
}		}

defm : SIMDAcrossLaneLongPairIntrinsic<"UADDLV", AArch64uaddlp>;		defm : SIMDAcrossLaneLongPairIntrinsic<"UADDLV", AArch64uaddlp>;
defm : SIMDAcrossLaneLongPairIntrinsic<"SADDLV", AArch64saddlp>;		defm : SIMDAcrossLaneLongPairIntrinsic<"SADDLV", AArch64saddlp>;

		// Patterns for uaddlv(uaddlp(x)) ==> uaddlv
		def : Pat<(i64 (int_aarch64_neon_uaddlv (v4i32 (AArch64uaddlp (v8i16 V128:$op))))),
		(i64 (EXTRACT_SUBREG
		(v4i32 (SUBREG_TO_REG (i64 0), (UADDLVv8i16v V128:$op), ssub)),
		dmgreenUnsubmitted Not Done Reply Inline Actions It is probably quite a minor point, but can you change this to a `(v4i32 (SUBREG_TO_REG (i64 0), (UADDLVv8i16v V128:$op), ssub))`. The EXTRACT_SUBREG is using the fact that the higher lanes will be implicitly zeroed. dmgreen: It is probably quite a minor point, but can you change this to a `(v4i32 (SUBREG_TO_REG (i64 0)…
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Let me update the pattern. jaykang10: Let me update the pattern.
		dsub))>;

		def : Pat<(i32 (int_aarch64_neon_uaddlv (v8i16 (AArch64uaddlp (v16i8 V128:$op))))),
		(i32 (EXTRACT_SUBREG
		(v8i16 (SUBREG_TO_REG (i64 0), (UADDLVv16i8v V128:$op), hsub)),
		dmgreenUnsubmitted Not Done Reply Inline Actions This one too, as it returns a h reg. dmgreen: This one too, as it returns a h reg.
		jaykang10AuthorUnsubmitted Done Reply Inline Actions Sorry. Let me update the pattern. jaykang10: Sorry. Let me update the pattern.
		ssub))>;

// Patterns for across-vector intrinsics, that have a node equivalent, that		// Patterns for across-vector intrinsics, that have a node equivalent, that
// returns a vector (with only the low lane defined) instead of a scalar.		// returns a vector (with only the low lane defined) instead of a scalar.
// In effect, opNode is the same as (scalar_to_vector (IntNode)).		// In effect, opNode is the same as (scalar_to_vector (IntNode)).
multiclass SIMDAcrossLanesIntrinsic<string baseOpc,		multiclass SIMDAcrossLanesIntrinsic<string baseOpc,
SDPatternOperator opNode> {		SDPatternOperator opNode> {
// If a lane instruction caught the vector_extract around opNode, we can		// If a lane instruction caught the vector_extract around opNode, we can
// directly match the latter to the instruction.		// directly match the latter to the instruction.
def : Pat<(v8i8 (opNode V64:$Rn)),		def : Pat<(v8i8 (opNode V64:$Rn)),
▲ Show 20 Lines • Show All 2,705 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/uaddlv-vaddlp-combine.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc -mtriple aarch64-none-linux-gnu < %s \| FileCheck %s

				define i32 @uaddlv_uaddlp_v8i16(<8 x i16> %0) {
				; CHECK-LABEL: uaddlv_uaddlp_v8i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uaddlv s0, v0.8h
				; CHECK-NEXT: fmov x0, d0
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
				%2 = tail call <4 x i32> @llvm.aarch64.neon.uaddlp.v4i32.v8i16(<8 x i16> %0)
				%3 = tail call i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32> %2)
				%4 = trunc i64 %3 to i32
				ret i32 %4
				}

				define i16 @uaddlv_uaddlp_v16i8(<16 x i8> %0) {
				; CHECK-LABEL: uaddlv_uaddlp_v16i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: uaddlv h0, v0.16b
				; CHECK-NEXT: fmov w0, s0
				; CHECK-NEXT: ret
				%2 = tail call <8 x i16> @llvm.aarch64.neon.uaddlp.v8i16.v16i8(<16 x i8> %0)
				%3 = tail call i32 @llvm.aarch64.neon.uaddlv.i32.v8i16(<8 x i16> %2)
				%4 = trunc i32 %3 to i16
				ret i16 %4
				}

				declare i64 @llvm.aarch64.neon.uaddlv.i64.v4i32(<4 x i32>)
				declare i32 @llvm.aarch64.neon.uaddlv.i32.v8i16(<8 x i16>)
				declare <4 x i32> @llvm.aarch64.neon.uaddlp.v4i32.v8i16(<8 x i16>)
				declare <8 x i16> @llvm.aarch64.neon.uaddlp.v8i16.v16i8(<16 x i8>)