This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/
-
test/
-
CodeGen/
-
aarch64-v8.2a-neon-intrinsics-constrained.c
-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
5/11
AArch64InstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
fp16_intrinsic_lane.ll

Differential D78252

[AArch64] FMLA/FMLS patterns improvement.
ClosedPublic

Authored by ilinpv on Apr 15 2020, 4:04 PM.

Download Raw Diff

Details

Reviewers

samparker
dmgreen
SjoerdMeijer

Summary

FMLA/FMLS f16 indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467
Removed redundant v2f32 vector_extract indexed pattern since
Instruction Selection is able to match v4f32 instead.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ilinpv created this revision.Apr 15 2020, 4:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2020, 4:04 PM

Herald added subscribers: cfe-commits, danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

srhines added a subscriber: srhines.Apr 15 2020, 4:11 PM

Harbormaster failed remote builds in B53450: Diff 257881!Apr 15 2020, 4:34 PM

dmgreen added inline comments.Apr 15 2020, 11:09 PM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055	Should we have equal patterns to those below for f32 as well? So using DUP, D vector (4xf16) and possibly from a vector_extract too.

More patterns added.

Harbormaster failed remote builds in B53729: Diff 258337!Apr 17 2020, 8:37 AM

ilinpv added inline comments.Apr 17 2020, 9:25 AM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055	I'm worried about performance impact of change fmadd/sub -> fmla/ls in last pattern case.

dmgreen added inline comments.Apr 18 2020, 6:15 AM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055	What performance impact are you worried about?
8077	Do you mean the v4f16 variant of this pattern?

ilinpv marked an inline comment as not done.Apr 18 2020, 7:20 AM

ilinpv added inline comments.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8055	I mean, can fmla/ls take more cycles that fmadd/sub, is it any performance improvement of such replacement?
8077	This pattern exactly replaces fmadd/sub to fmla/ls, so it is questionable weather or not this pattern is useful. v4f16 vector_extract variant has no any test cases at all.

Patterns corrected, vector_extract tests added.

ilinpv marked 2 inline comments as done.Apr 20 2020, 5:19 PM

v2f32 pattern removed, test added.

LGTM. Thanks

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8094	I was a little surprised when you said we could remove these, but it looks like the vector_extract (v2f32) is always converted to a vector_extract (v4f32 insert_subvector (v2f32)). So I agree, seems Ok to remove. (And if we do run into a problem, we can always add it back in).

This revision is now accepted and ready to land.Apr 21 2020, 8:45 AM

Harbormaster failed remote builds in B54102: Diff 259008!Apr 21 2020, 9:10 AM

Committed be881e2831735d6879ee43710f5a4d1c8d50c615

ab added a subscriber: ab.Apr 21 2020, 8:07 PM

ab added inline comments.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058	Should this be V128_lo? I don't think this is encodable for Rm in V16-V31 (same in the other indexed f16 variants I think)

ilinpv marked 2 inline comments as done.Apr 22 2020, 6:17 AM

ilinpv added inline comments.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058	Yep, I double checked encoding, you are right. Thank you very much for this. Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be

Patterns corrected to comply with encoding 4eca1c06a4a9183fcf7bb230d894617caf3cf3be

ab added inline comments.Apr 22 2020, 3:03 PM

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058	Thanks Pavel! I think this applies to the `AArch64dup` variants too, which does entail adding `FPR16Op_lo` and `FPR16_lo` I imagine, and maybe a couple more

ilinpv marked an inline comment as done.Apr 23 2020, 3:48 PM

ilinpv added inline comments.

llvm/lib/Target/AArch64/AArch64InstrFormats.td
8058	Oops. Thanks again, fix landed cc457672e628846c20e92c6e0a82896f0d6db031

ilinpv mentioned this in D78928: fp16 indexed patterns V16-V31 registers test cases..Apr 27 2020, 7:22 AM

Revision Contents

Path

Size

clang/

test/

CodeGen/

aarch64-v8.2a-neon-intrinsics-constrained.c

32 lines

llvm/

lib/

Target/

AArch64/

AArch64InstrFormats.td

28 lines

test/

CodeGen/

AArch64/

fp16_intrinsic_lane.ll

60 lines

Diff 258865

clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	// COMMONIR: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>			// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
	// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>			// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
	// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// COMMONIR: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>			// COMMONIR: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// COMMONIR: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>			// COMMONIR: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
	// UNCONSTRAINED: [[FMLA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]])			// UNCONSTRAINED: [[FMLA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]])
	// CONSTRAINED: [[FMLA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMLA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.4h			// CHECK-ASM: fmla v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <4 x half> [[FMLA]]			// COMMONIR: ret <4 x half> [[FMLA]]
	float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {			float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
	return vfma_lane_f16(a, b, c, 3);			return vfma_lane_f16(a, b, c, 3);
	}			}

	// COMMON-LABEL: test_vfmaq_lane_f16			// COMMON-LABEL: test_vfmaq_lane_f16
	// COMMONIR: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>			// COMMONIR: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
	// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>			// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
	// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>			// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
	// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>			// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>			// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
	// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])			// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])
	// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h			// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <8 x half> [[FMLA]]			// COMMONIR: ret <8 x half> [[FMLA]]
	float16x8_t test_vfmaq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {			float16x8_t test_vfmaq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
	return vfmaq_lane_f16(a, b, c, 3);			return vfmaq_lane_f16(a, b, c, 3);
	}			}

	// COMMON-LABEL: test_vfma_laneq_f16			// COMMON-LABEL: test_vfma_laneq_f16
	// COMMONIR: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>			// COMMONIR: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
	// COMMONIR: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <4 x half> %b to <8 x i8>
	Show All 15 Lines
	// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> %b to <16 x i8>
	// COMMONIR: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>			// COMMONIR: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
	// COMMONIR: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>			// COMMONIR: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
	// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>			// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>			// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
	// COMMONIR: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>			// COMMONIR: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
	// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]])			// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]])
	// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h			// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <8 x half> [[FMLA]]			// COMMONIR: ret <8 x half> [[FMLA]]
	float16x8_t test_vfmaq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {			float16x8_t test_vfmaq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
	return vfmaq_laneq_f16(a, b, c, 7);			return vfmaq_laneq_f16(a, b, c, 7);
	}			}

	// COMMON-LABEL: test_vfma_n_f16			// COMMON-LABEL: test_vfma_n_f16
	// COMMONIR: [[TMP0:%.*]] = insertelement <4 x half> undef, half %c, i32 0			// COMMONIR: [[TMP0:%.*]] = insertelement <4 x half> undef, half %c, i32 0
	// COMMONIR: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %c, i32 1			// COMMONIR: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %c, i32 1
	// COMMONIR: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %c, i32 2			// COMMONIR: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %c, i32 2
	// COMMONIR: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %c, i32 3			// COMMONIR: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %c, i32 3
	// UNCONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> [[TMP3]], <4 x half> %a)			// UNCONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> [[TMP3]], <4 x half> %a)
	// CONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> %b, <4 x half> [[TMP3]], <4 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> %b, <4 x half> [[TMP3]], <4 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.4h			// CHECK-ASM: fmla v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <4 x half> [[FMA]]			// COMMONIR: ret <4 x half> [[FMA]]
	float16x4_t test_vfma_n_f16(float16x4_t a, float16x4_t b, float16_t c) {			float16x4_t test_vfma_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
	return vfma_n_f16(a, b, c);			return vfma_n_f16(a, b, c);
	}			}

	// COMMON-LABEL: test_vfmaq_n_f16			// COMMON-LABEL: test_vfmaq_n_f16
	// COMMONIR: [[TMP0:%.*]] = insertelement <8 x half> undef, half %c, i32 0			// COMMONIR: [[TMP0:%.*]] = insertelement <8 x half> undef, half %c, i32 0
	// COMMONIR: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %c, i32 1			// COMMONIR: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %c, i32 1
	// COMMONIR: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %c, i32 2			// COMMONIR: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %c, i32 2
	// COMMONIR: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %c, i32 3			// COMMONIR: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %c, i32 3
	// COMMONIR: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %c, i32 4			// COMMONIR: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %c, i32 4
	// COMMONIR: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %c, i32 5			// COMMONIR: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %c, i32 5
	// COMMONIR: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %c, i32 6			// COMMONIR: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %c, i32 6
	// COMMONIR: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %c, i32 7			// COMMONIR: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %c, i32 7
	// UNCONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> [[TMP7]], <8 x half> %a)			// UNCONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> [[TMP7]], <8 x half> %a)
	// CONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> %b, <8 x half> [[TMP7]], <8 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> %b, <8 x half> [[TMP7]], <8 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h			// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <8 x half> [[FMA]]			// COMMONIR: ret <8 x half> [[FMA]]
	float16x8_t test_vfmaq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {			float16x8_t test_vfmaq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
	return vfmaq_n_f16(a, b, c);			return vfmaq_n_f16(a, b, c);
	}			}

	// COMMON-LABEL: test_vfmah_lane_f16			// COMMON-LABEL: test_vfmah_lane_f16
	// COMMONIR: [[EXTR:%.*]] = extractelement <4 x half> %c, i32 3			// COMMONIR: [[EXTR:%.*]] = extractelement <4 x half> %c, i32 3
	// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half [[EXTR]], half %a)			// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half [[EXTR]], half %a)
	// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half %b, half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half %b, half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmadd h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}			// CHECK-ASM: fmla h{{[0-9]+}}, h{{[0-9]+}}, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret half [[FMA]]			// COMMONIR: ret half [[FMA]]
	float16_t test_vfmah_lane_f16(float16_t a, float16_t b, float16x4_t c) {			float16_t test_vfmah_lane_f16(float16_t a, float16_t b, float16x4_t c) {
	return vfmah_lane_f16(a, b, c, 3);			return vfmah_lane_f16(a, b, c, 3);
	}			}

	// COMMON-LABEL: test_vfmah_laneq_f16			// COMMON-LABEL: test_vfmah_laneq_f16
	// COMMONIR: [[EXTR:%.*]] = extractelement <8 x half> %c, i32 7			// COMMONIR: [[EXTR:%.*]] = extractelement <8 x half> %c, i32 7
	// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half [[EXTR]], half %a)			// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half %b, half [[EXTR]], half %a)
	// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half %b, half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half %b, half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmadd h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}			// CHECK-ASM: fmla h{{[0-9]+}}, h{{[0-9]+}}, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret half [[FMA]]			// COMMONIR: ret half [[FMA]]
	float16_t test_vfmah_laneq_f16(float16_t a, float16_t b, float16x8_t c) {			float16_t test_vfmah_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
	return vfmah_laneq_f16(a, b, c, 7);			return vfmah_laneq_f16(a, b, c, 7);
	}			}

	// COMMON-LABEL: test_vfms_lane_f16			// COMMON-LABEL: test_vfms_lane_f16
	// COMMONIR: [[SUB:%.*]] = fneg <4 x half> %b			// COMMONIR: [[SUB:%.*]] = fneg <4 x half> %b
	// CHECK-ASM: fneg v{{[0-9]+}}.4h, v{{[0-9]+}}.4h
	// COMMONIR: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>			// COMMONIR: [[TMP0:%.*]] = bitcast <4 x half> %a to <8 x i8>
	// COMMONIR: [[TMP1:%.*]] = bitcast <4 x half> [[SUB]] to <8 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <4 x half> [[SUB]] to <8 x i8>
	// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>			// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
	// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>			// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
	// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>			// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <4 x i32> <i32 3, i32 3, i32 3, i32 3>
	// COMMONIR: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>			// COMMONIR: [[TMP4:%.*]] = bitcast <8 x i8> [[TMP1]] to <4 x half>
	// COMMONIR: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>			// COMMONIR: [[TMP5:%.*]] = bitcast <8 x i8> [[TMP0]] to <4 x half>
	// UNCONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]])			// UNCONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]])
	// CONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[TMP4]], <4 x half> [[LANE]], <4 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.4h			// CHECK-ASM: fmls v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <4 x half> [[FMA]]			// COMMONIR: ret <4 x half> [[FMA]]
	float16x4_t test_vfms_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {			float16x4_t test_vfms_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
	return vfms_lane_f16(a, b, c, 3);			return vfms_lane_f16(a, b, c, 3);
	}			}

	// COMMON-LABEL: test_vfmsq_lane_f16			// COMMON-LABEL: test_vfmsq_lane_f16
	// COMMONIR: [[SUB:%.*]] = fneg <8 x half> %b			// COMMONIR: [[SUB:%.*]] = fneg <8 x half> %b
	// CHECK-ASM: fneg v{{[0-9]+}}.8h, v{{[0-9]+}}.8h
	// COMMONIR: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>			// COMMONIR: [[TMP0:%.*]] = bitcast <8 x half> %a to <16 x i8>
	// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>
	// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>			// COMMONIR: [[TMP2:%.*]] = bitcast <4 x half> %c to <8 x i8>
	// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>			// COMMONIR: [[TMP3:%.*]] = bitcast <8 x i8> [[TMP2]] to <4 x half>
	// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>			// COMMONIR: [[LANE:%.*]] = shufflevector <4 x half> [[TMP3]], <4 x half> [[TMP3]], <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
	// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>			// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>			// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
	// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])			// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]])
	// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[TMP4]], <8 x half> [[LANE]], <8 x half> [[TMP5]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h			// CHECK-ASM: fmls v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <8 x half> [[FMLA]]			// COMMONIR: ret <8 x half> [[FMLA]]
	float16x8_t test_vfmsq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {			float16x8_t test_vfmsq_lane_f16(float16x8_t a, float16x8_t b, float16x4_t c) {
	return vfmsq_lane_f16(a, b, c, 3);			return vfmsq_lane_f16(a, b, c, 3);
	}			}

	// COMMON-LABEL: test_vfms_laneq_f16			// COMMON-LABEL: test_vfms_laneq_f16
	// COMMONIR: [[SUB:%.*]] = fneg <4 x half> %b			// COMMONIR: [[SUB:%.*]] = fneg <4 x half> %b
	// CHECK-ASM-NOT: fneg			// CHECK-ASM-NOT: fneg
	Show All 19 Lines
	// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>			// COMMONIR: [[TMP1:%.*]] = bitcast <8 x half> [[SUB]] to <16 x i8>
	// COMMONIR: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>			// COMMONIR: [[TMP2:%.*]] = bitcast <8 x half> %c to <16 x i8>
	// COMMONIR: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>			// COMMONIR: [[TMP3:%.*]] = bitcast <16 x i8> [[TMP0]] to <8 x half>
	// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>			// COMMONIR: [[TMP4:%.*]] = bitcast <16 x i8> [[TMP1]] to <8 x half>
	// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>			// COMMONIR: [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to <8 x half>
	// COMMONIR: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>			// COMMONIR: [[LANE:%.*]] = shufflevector <8 x half> [[TMP5]], <8 x half> [[TMP5]], <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
	// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]])			// UNCONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]])
	// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMLA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[LANE]], <8 x half> [[TMP4]], <8 x half> [[TMP3]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmls v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h			// CHECK-ASM: fmls v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <8 x half> [[FMLA]]			// COMMONIR: ret <8 x half> [[FMLA]]
	float16x8_t test_vfmsq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {			float16x8_t test_vfmsq_laneq_f16(float16x8_t a, float16x8_t b, float16x8_t c) {
	return vfmsq_laneq_f16(a, b, c, 7);			return vfmsq_laneq_f16(a, b, c, 7);
	}			}

	// COMMON-LABEL: test_vfms_n_f16			// COMMON-LABEL: test_vfms_n_f16
	// COMMONIR: [[SUB:%.*]] = fneg <4 x half> %b			// COMMONIR: [[SUB:%.*]] = fneg <4 x half> %b
	// CHECK-ASM: fneg v{{[0-9]+}}.4h, v{{[0-9]+}}.4h
	// COMMONIR: [[TMP0:%.*]] = insertelement <4 x half> undef, half %c, i32 0			// COMMONIR: [[TMP0:%.*]] = insertelement <4 x half> undef, half %c, i32 0
	// COMMONIR: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %c, i32 1			// COMMONIR: [[TMP1:%.*]] = insertelement <4 x half> [[TMP0]], half %c, i32 1
	// COMMONIR: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %c, i32 2			// COMMONIR: [[TMP2:%.*]] = insertelement <4 x half> [[TMP1]], half %c, i32 2
	// COMMONIR: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %c, i32 3			// COMMONIR: [[TMP3:%.*]] = insertelement <4 x half> [[TMP2]], half %c, i32 3
	// UNCONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[SUB]], <4 x half> [[TMP3]], <4 x half> %a)			// UNCONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.fma.v4f16(<4 x half> [[SUB]], <4 x half> [[TMP3]], <4 x half> %a)
	// CONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[SUB]], <4 x half> [[TMP3]], <4 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call <4 x half> @llvm.experimental.constrained.fma.v4f16(<4 x half> [[SUB]], <4 x half> [[TMP3]], <4 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.4h			// CHECK-ASM: fmls v{{[0-9]+}}.4h, v{{[0-9]+}}.4h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <4 x half> [[FMA]]			// COMMONIR: ret <4 x half> [[FMA]]
	float16x4_t test_vfms_n_f16(float16x4_t a, float16x4_t b, float16_t c) {			float16x4_t test_vfms_n_f16(float16x4_t a, float16x4_t b, float16_t c) {
	return vfms_n_f16(a, b, c);			return vfms_n_f16(a, b, c);
	}			}

	// COMMON-LABEL: test_vfmsq_n_f16			// COMMON-LABEL: test_vfmsq_n_f16
	// COMMONIR: [[SUB:%.*]] = fneg <8 x half> %b			// COMMONIR: [[SUB:%.*]] = fneg <8 x half> %b
	// CHECK-ASM: fneg v{{[0-9]+}}.8h, v{{[0-9]+}}.8h
	// COMMONIR: [[TMP0:%.*]] = insertelement <8 x half> undef, half %c, i32 0			// COMMONIR: [[TMP0:%.*]] = insertelement <8 x half> undef, half %c, i32 0
	// COMMONIR: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %c, i32 1			// COMMONIR: [[TMP1:%.*]] = insertelement <8 x half> [[TMP0]], half %c, i32 1
	// COMMONIR: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %c, i32 2			// COMMONIR: [[TMP2:%.*]] = insertelement <8 x half> [[TMP1]], half %c, i32 2
	// COMMONIR: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %c, i32 3			// COMMONIR: [[TMP3:%.*]] = insertelement <8 x half> [[TMP2]], half %c, i32 3
	// COMMONIR: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %c, i32 4			// COMMONIR: [[TMP4:%.*]] = insertelement <8 x half> [[TMP3]], half %c, i32 4
	// COMMONIR: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %c, i32 5			// COMMONIR: [[TMP5:%.*]] = insertelement <8 x half> [[TMP4]], half %c, i32 5
	// COMMONIR: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %c, i32 6			// COMMONIR: [[TMP6:%.*]] = insertelement <8 x half> [[TMP5]], half %c, i32 6
	// COMMONIR: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %c, i32 7			// COMMONIR: [[TMP7:%.*]] = insertelement <8 x half> [[TMP6]], half %c, i32 7
	// UNCONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[SUB]], <8 x half> [[TMP7]], <8 x half> %a)			// UNCONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[SUB]], <8 x half> [[TMP7]], <8 x half> %a)
	// CONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[SUB]], <8 x half> [[TMP7]], <8 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call <8 x half> @llvm.experimental.constrained.fma.v8f16(<8 x half> [[SUB]], <8 x half> [[TMP7]], <8 x half> %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmla v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.8h			// CHECK-ASM: fmls v{{[0-9]+}}.8h, v{{[0-9]+}}.8h, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret <8 x half> [[FMA]]			// COMMONIR: ret <8 x half> [[FMA]]
	float16x8_t test_vfmsq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {			float16x8_t test_vfmsq_n_f16(float16x8_t a, float16x8_t b, float16_t c) {
	return vfmsq_n_f16(a, b, c);			return vfmsq_n_f16(a, b, c);
	}			}

	// COMMON-LABEL: test_vfmsh_lane_f16			// COMMON-LABEL: test_vfmsh_lane_f16
	// UNCONSTRAINED: [[TMP0:%.*]] = fpext half %b to float			// UNCONSTRAINED: [[TMP0:%.*]] = fpext half %b to float
	// CONSTRAINED: [[TMP0:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half %b, metadata !"fpexcept.strict")			// CONSTRAINED: [[TMP0:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half %b, metadata !"fpexcept.strict")
	// CHECK-ASM: fcvt s{{[0-9]+}}, h{{[0-9]+}}			// CHECK-ASM: fcvt s{{[0-9]+}}, h{{[0-9]+}}
	// COMMONIR: [[TMP1:%.*]] = fneg float [[TMP0]]			// COMMONIR: [[TMP1:%.*]] = fneg float [[TMP0]]
	// CHECK-ASM: fneg s{{[0-9]+}}, s{{[0-9]+}}			// CHECK-ASM: fneg s{{[0-9]+}}, s{{[0-9]+}}
	// UNCONSTRAINED: [[SUB:%.*]] = fptrunc float [[TMP1]] to half			// UNCONSTRAINED: [[SUB:%.*]] = fptrunc float [[TMP1]] to half
	// CONSTRAINED: [[SUB:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float [[TMP1]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[SUB:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float [[TMP1]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fcvt h{{[0-9]+}}, s{{[0-9]+}}			// CHECK-ASM: fcvt h{{[0-9]+}}, s{{[0-9]+}}
	// COMMONIR: [[EXTR:%.*]] = extractelement <4 x half> %c, i32 3			// COMMONIR: [[EXTR:%.*]] = extractelement <4 x half> %c, i32 3
	// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half [[SUB]], half [[EXTR]], half %a)			// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half [[SUB]], half [[EXTR]], half %a)
	// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[SUB]], half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[SUB]], half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmadd h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}			// CHECK-ASM: fmla h{{[0-9]+}}, h{{[0-9]+}}, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret half [[FMA]]			// COMMONIR: ret half [[FMA]]
	float16_t test_vfmsh_lane_f16(float16_t a, float16_t b, float16x4_t c) {			float16_t test_vfmsh_lane_f16(float16_t a, float16_t b, float16x4_t c) {
	return vfmsh_lane_f16(a, b, c, 3);			return vfmsh_lane_f16(a, b, c, 3);
	}			}

	// COMMON-LABEL: test_vfmsh_laneq_f16			// COMMON-LABEL: test_vfmsh_laneq_f16
	// UNCONSTRAINED: [[TMP0:%.*]] = fpext half %b to float			// UNCONSTRAINED: [[TMP0:%.*]] = fpext half %b to float
	// CONSTRAINED: [[TMP0:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half %b, metadata !"fpexcept.strict")			// CONSTRAINED: [[TMP0:%.*]] = call float @llvm.experimental.constrained.fpext.f32.f16(half %b, metadata !"fpexcept.strict")
	// CHECK-ASM: fcvt s{{[0-9]+}}, h{{[0-9]+}}			// CHECK-ASM: fcvt s{{[0-9]+}}, h{{[0-9]+}}
	// COMMONIR: [[TMP1:%.*]] = fneg float [[TMP0]]			// COMMONIR: [[TMP1:%.*]] = fneg float [[TMP0]]
	// CHECK-ASM: fneg s{{[0-9]+}}, s{{[0-9]+}}			// CHECK-ASM: fneg s{{[0-9]+}}, s{{[0-9]+}}
	// UNCONSTRAINED: [[SUB:%.*]] = fptrunc float [[TMP1]] to half			// UNCONSTRAINED: [[SUB:%.*]] = fptrunc float [[TMP1]] to half
	// CONSTRAINED: [[SUB:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float [[TMP1]], metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[SUB:%.*]] = call half @llvm.experimental.constrained.fptrunc.f16.f32(float [[TMP1]], metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fcvt h{{[0-9]+}}, s{{[0-9]+}}			// CHECK-ASM: fcvt h{{[0-9]+}}, s{{[0-9]+}}
	// COMMONIR: [[EXTR:%.*]] = extractelement <8 x half> %c, i32 7			// COMMONIR: [[EXTR:%.*]] = extractelement <8 x half> %c, i32 7
	// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half [[SUB]], half [[EXTR]], half %a)			// UNCONSTRAINED: [[FMA:%.*]] = call half @llvm.fma.f16(half [[SUB]], half [[EXTR]], half %a)
	// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[SUB]], half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")			// CONSTRAINED: [[FMA:%.*]] = call half @llvm.experimental.constrained.fma.f16(half [[SUB]], half [[EXTR]], half %a, metadata !"round.tonearest", metadata !"fpexcept.strict")
	// CHECK-ASM: fmadd h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}, h{{[0-9]+}}			// CHECK-ASM: fmla h{{[0-9]+}}, h{{[0-9]+}}, v{{[0-9]+}}.h[{{[0-9]+}}]
	// COMMONIR: ret half [[FMA]]			// COMMONIR: ret half [[FMA]]
	float16_t test_vfmsh_laneq_f16(float16_t a, float16_t b, float16x8_t c) {			float16_t test_vfmsh_laneq_f16(float16_t a, float16_t b, float16x8_t c) {
	return vfmsh_laneq_f16(a, b, c, 7);			return vfmsh_laneq_f16(a, b, c, 7);
	}			}

llvm/lib/Target/AArch64/AArch64InstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,046 Lines • ▼ Show 20 Lines	[(set (f64 FPR64Op:$Rd),
VectorIndexD:$idx))))]> {		VectorIndexD:$idx))))]> {
bits<1> idx;		bits<1> idx;
let Inst{11} = idx{0};		let Inst{11} = idx{0};
let Inst{21} = 0;		let Inst{21} = 0;
}		}
}		}

multiclass SIMDFPIndexedTiedPatterns<string INST, SDPatternOperator OpNode> {		multiclass SIMDFPIndexedTiedPatterns<string INST, SDPatternOperator OpNode> {
		let Predicates = [HasNEON, HasFullFP16] in {
		dmgreenUnsubmitted Not Done Reply Inline Actions Should we have equal patterns to those below for f32 as well? So using DUP, D vector (4xf16) and possibly from a vector_extract too. dmgreen: Should we have equal patterns to those below for f32 as well? So using DUP, D vector (4xf16)…
		ilinpvAuthorUnsubmitted Not Done Reply Inline Actions I'm worried about performance impact of change fmadd/sub -> fmla/ls in last pattern case. ilinpv: I'm worried about performance impact of change fmadd/sub -> fmla/ls in last pattern case.
		dmgreenUnsubmitted Not Done Reply Inline Actions What performance impact are you worried about? dmgreen: What performance impact are you worried about?
		ilinpvAuthorUnsubmitted Done Reply Inline Actions I mean, can fmla/ls take more cycles that fmadd/sub, is it any performance improvement of such replacement? ilinpv: I mean, can fmla/ls take more cycles that fmadd/sub, is it any performance improvement of such…
		// Patterns for f16: DUPLANE, DUP scalar and vector_extract.
		def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
		(AArch64duplane16 (v8f16 V128:$Rm),
		abUnsubmitted Not Done Reply Inline Actions Should this be V128_lo? I don't think this is encodable for Rm in V16-V31 (same in the other indexed f16 variants I think) ab: Should this be V128_lo? I don't think this is encodable for Rm in V16-V31 (same in the other…
		ilinpvAuthorUnsubmitted Done Reply Inline Actions Yep, I double checked encoding, you are right. Thank you very much for this. Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be ilinpv: Yep, I double checked encoding, you are right. Thank you very much for this. Fixed in…
		abUnsubmitted Not Done Reply Inline Actions Thanks Pavel! I think this applies to the `AArch64dup` variants too, which does entail adding `FPR16Op_lo` and `FPR16_lo` I imagine, and maybe a couple more ab: Thanks Pavel! I think this applies to the `AArch64dup` variants too, which does entail adding…
		ilinpvAuthorUnsubmitted Done Reply Inline Actions Oops. Thanks again, fix landed cc457672e628846c20e92c6e0a82896f0d6db031 ilinpv: Oops. Thanks again, fix landed cc457672e628846c20e92c6e0a82896f0d6db031
		VectorIndexH:$idx))),
		(!cast<Instruction>(INST # "v8i16_indexed")
		V128:$Rd, V128:$Rn, V128:$Rm, VectorIndexH:$idx)>;
		def : Pat<(v8f16 (OpNode (v8f16 V128:$Rd), (v8f16 V128:$Rn),
		(AArch64dup (f16 FPR16Op:$Rm)))),
		(!cast<Instruction>(INST # "v8i16_indexed") V128:$Rd, V128:$Rn,
		(SUBREG_TO_REG (i32 0), FPR16Op:$Rm, hsub), (i64 0))>;

		def : Pat<(v4f16 (OpNode (v4f16 V64:$Rd), (v4f16 V64:$Rn),
		(AArch64duplane16 (v8f16 V128:$Rm),
		VectorIndexS:$idx))),
		(!cast<Instruction>(INST # "v4i16_indexed")
		V64:$Rd, V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
		def : Pat<(v4f16 (OpNode (v4f16 V64:$Rd), (v4f16 V64:$Rn),
		(AArch64dup (f16 FPR16Op:$Rm)))),
		(!cast<Instruction>(INST # "v4i16_indexed") V64:$Rd, V64:$Rn,
		(SUBREG_TO_REG (i32 0), FPR16Op:$Rm, hsub), (i64 0))>;

		def : Pat<(f16 (OpNode (f16 FPR16:$Rd), (f16 FPR16:$Rn),
		dmgreenUnsubmitted Not Done Reply Inline Actions Do you mean the v4f16 variant of this pattern? dmgreen: Do you mean the v4f16 variant of this pattern?
		ilinpvAuthorUnsubmitted Done Reply Inline Actions This pattern exactly replaces fmadd/sub to fmla/ls, so it is questionable weather or not this pattern is useful. v4f16 vector_extract variant has no any test cases at all. ilinpv: This pattern exactly replaces fmadd/sub to fmla/ls, so it is questionable weather or not this…
		(vector_extract (v8f16 V128:$Rm), VectorIndexH:$idx))),
		(!cast<Instruction>(INST # "v1i16_indexed") FPR16:$Rd, FPR16:$Rn,
		V128:$Rm, VectorIndexH:$idx)>;
		} // Predicates = [HasNEON, HasFullFP16]

// 2 variants for the .2s version: DUPLANE from 128-bit and DUP scalar.		// 2 variants for the .2s version: DUPLANE from 128-bit and DUP scalar.
def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),		def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),
(AArch64duplane32 (v4f32 V128:$Rm),		(AArch64duplane32 (v4f32 V128:$Rm),
VectorIndexS:$idx))),		VectorIndexS:$idx))),
(!cast<Instruction>(INST # v2i32_indexed)		(!cast<Instruction>(INST # v2i32_indexed)
V64:$Rd, V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;		V64:$Rd, V64:$Rn, V128:$Rm, VectorIndexS:$idx)>;
def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),		def : Pat<(v2f32 (OpNode (v2f32 V64:$Rd), (v2f32 V64:$Rn),
(AArch64dup (f32 FPR32Op:$Rm)))),		(AArch64dup (f32 FPR32Op:$Rm)))),
Show All 23 Lines	def : Pat<(v2f64 (OpNode (v2f64 V128:$Rd), (v2f64 V128:$Rn),
(!cast<Instruction>(INST # "v2i64_indexed") V128:$Rd, V128:$Rn,		(!cast<Instruction>(INST # "v2i64_indexed") V128:$Rd, V128:$Rn,
(SUBREG_TO_REG (i32 0), FPR64Op:$Rm, dsub), (i64 0))>;		(SUBREG_TO_REG (i32 0), FPR64Op:$Rm, dsub), (i64 0))>;

// 2 variants for 32-bit scalar version: extract from .2s or from .4s		// 2 variants for 32-bit scalar version: extract from .2s or from .4s
def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),		def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),
(vector_extract (v4f32 V128:$Rm), VectorIndexS:$idx))),		(vector_extract (v4f32 V128:$Rm), VectorIndexS:$idx))),
(!cast<Instruction>(INST # "v1i32_indexed") FPR32:$Rd, FPR32:$Rn,		(!cast<Instruction>(INST # "v1i32_indexed") FPR32:$Rd, FPR32:$Rn,
V128:$Rm, VectorIndexS:$idx)>;		V128:$Rm, VectorIndexS:$idx)>;
def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),		def : Pat<(f32 (OpNode (f32 FPR32:$Rd), (f32 FPR32:$Rn),
dmgreenUnsubmitted Done Reply Inline Actions I was a little surprised when you said we could remove these, but it looks like the vector_extract (v2f32) is always converted to a vector_extract (v4f32 insert_subvector (v2f32)). So I agree, seems Ok to remove. (And if we do run into a problem, we can always add it back in). dmgreen: I was a little surprised when you said we could remove these, but it looks like the…
(vector_extract (v2f32 V64:$Rm), VectorIndexS:$idx))),		(vector_extract (v2f32 V64:$Rm), VectorIndexS:$idx))),
(!cast<Instruction>(INST # "v1i32_indexed") FPR32:$Rd, FPR32:$Rn,		(!cast<Instruction>(INST # "v1i32_indexed") FPR32:$Rd, FPR32:$Rn,
(SUBREG_TO_REG (i32 0), V64:$Rm, dsub), VectorIndexS:$idx)>;		(SUBREG_TO_REG (i32 0), V64:$Rm, dsub), VectorIndexS:$idx)>;

// 1 variant for 64-bit scalar version: extract from .1d or from .2d		// 1 variant for 64-bit scalar version: extract from .1d or from .2d
def : Pat<(f64 (OpNode (f64 FPR64:$Rd), (f64 FPR64:$Rn),		def : Pat<(f64 (OpNode (f64 FPR64:$Rd), (f64 FPR64:$Rn),
(vector_extract (v2f64 V128:$Rm), VectorIndexD:$idx))),		(vector_extract (v2f64 V128:$Rm), VectorIndexD:$idx))),
(!cast<Instruction>(INST # "v1i64_indexed") FPR64:$Rd, FPR64:$Rn,		(!cast<Instruction>(INST # "v1i64_indexed") FPR64:$Rd, FPR64:$Rn,
▲ Show 20 Lines • Show All 3,050 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-eabi -mattr=+v8.2a,+fullfp16 \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-eabi -mattr=+v8.2a,+fullfp16 \| FileCheck %s

	declare half @llvm.aarch64.neon.fmulx.f16(half, half)			declare half @llvm.aarch64.neon.fmulx.f16(half, half)
	declare <4 x half> @llvm.aarch64.neon.fmulx.v4f16(<4 x half>, <4 x half>)			declare <4 x half> @llvm.aarch64.neon.fmulx.v4f16(<4 x half>, <4 x half>)
	declare <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half>, <8 x half>)			declare <8 x half> @llvm.aarch64.neon.fmulx.v8f16(<8 x half>, <8 x half>)
	declare <4 x half> @llvm.fma.v4f16(<4 x half>, <4 x half>, <4 x half>)			declare <4 x half> @llvm.fma.v4f16(<4 x half>, <4 x half>, <4 x half>)
	declare <8 x half> @llvm.fma.v8f16(<8 x half>, <8 x half>, <8 x half>)			declare <8 x half> @llvm.fma.v8f16(<8 x half>, <8 x half>, <8 x half>)
	declare half @llvm.fma.f16(half, half, half) #1			declare half @llvm.fma.f16(half, half, half) #1

	define dso_local <4 x half> @t_vfma_lane_f16(<4 x half> %a, <4 x half> %b, <4 x half> %c, i32 %lane) {			define dso_local <4 x half> @t_vfma_lane_f16(<4 x half> %a, <4 x half> %b, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfma_lane_f16:			; CHECK-LABEL: t_vfma_lane_f16:
	; CHECK: .Lt_vfma_lane_f16$local:			; CHECK: .Lt_vfma_lane_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: dup v2.4h, v2.h[0]			; CHECK-NEXT: fmla v0.4h, v1.4h, v2.h[0]
	; CHECK-NEXT: fmla v0.4h, v2.4h, v1.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer			%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
	%fmla3 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> %lane1, <4 x half> %a)			%fmla3 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> %lane1, <4 x half> %a)
	ret <4 x half> %fmla3			ret <4 x half> %fmla3
	}			}

	define dso_local <8 x half> @t_vfmaq_lane_f16(<8 x half> %a, <8 x half> %b, <4 x half> %c, i32 %lane) {			define dso_local <8 x half> @t_vfmaq_lane_f16(<8 x half> %a, <8 x half> %b, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmaq_lane_f16:			; CHECK-LABEL: t_vfmaq_lane_f16:
	; CHECK: .Lt_vfmaq_lane_f16$local:			; CHECK: .Lt_vfmaq_lane_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: dup v2.8h, v2.h[0]			; CHECK-NEXT: fmla v0.8h, v1.8h, v2.h[0]
	; CHECK-NEXT: fmla v0.8h, v2.8h, v1.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer			%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer
	%fmla3 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> %lane1, <8 x half> %a)			%fmla3 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> %lane1, <8 x half> %a)
	ret <8 x half> %fmla3			ret <8 x half> %fmla3
	}			}

	define dso_local <4 x half> @t_vfma_laneq_f16(<4 x half> %a, <4 x half> %b, <8 x half> %c, i32 %lane) {			define dso_local <4 x half> @t_vfma_laneq_f16(<4 x half> %a, <4 x half> %b, <8 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfma_laneq_f16:			; CHECK-LABEL: t_vfma_laneq_f16:
	; CHECK: .Lt_vfma_laneq_f16$local:			; CHECK: .Lt_vfma_laneq_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: dup v2.4h, v2.h[0]			; CHECK-NEXT: fmla v0.4h, v1.4h, v2.h[0]
	; CHECK-NEXT: fmla v0.4h, v1.4h, v2.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer			%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer
	%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %lane1, <4 x half> %b, <4 x half> %a)			%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %lane1, <4 x half> %b, <4 x half> %a)
	ret <4 x half> %0			ret <4 x half> %0
	}			}

	define dso_local <8 x half> @t_vfmaq_laneq_f16(<8 x half> %a, <8 x half> %b, <8 x half> %c, i32 %lane) {			define dso_local <8 x half> @t_vfmaq_laneq_f16(<8 x half> %a, <8 x half> %b, <8 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmaq_laneq_f16:			; CHECK-LABEL: t_vfmaq_laneq_f16:
	; CHECK: .Lt_vfmaq_laneq_f16$local:			; CHECK: .Lt_vfmaq_laneq_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: dup v2.8h, v2.h[0]			; CHECK-NEXT: fmla v0.8h, v1.8h, v2.h[0]
	; CHECK-NEXT: fmla v0.8h, v1.8h, v2.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer			%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer
	%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %lane1, <8 x half> %b, <8 x half> %a)			%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %lane1, <8 x half> %b, <8 x half> %a)
	ret <8 x half> %0			ret <8 x half> %0
	}			}

	define dso_local <4 x half> @t_vfma_n_f16(<4 x half> %a, <4 x half> %b, half %c) {			define dso_local <4 x half> @t_vfma_n_f16(<4 x half> %a, <4 x half> %b, half %c) {
	; CHECK-LABEL: t_vfma_n_f16:			; CHECK-LABEL: t_vfma_n_f16:
	; CHECK: .Lt_vfma_n_f16$local:			; CHECK: .Lt_vfma_n_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2			; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2
	; CHECK-NEXT: dup v2.4h, v2.h[0]			; CHECK-NEXT: fmla v0.4h, v1.4h, v2.h[0]
	; CHECK-NEXT: fmla v0.4h, v2.4h, v1.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vecinit = insertelement <4 x half> undef, half %c, i32 0			%vecinit = insertelement <4 x half> undef, half %c, i32 0
	%vecinit3 = shufflevector <4 x half> %vecinit, <4 x half> undef, <4 x i32> zeroinitializer			%vecinit3 = shufflevector <4 x half> %vecinit, <4 x half> undef, <4 x i32> zeroinitializer
	%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> %vecinit3, <4 x half> %a) #4			%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %b, <4 x half> %vecinit3, <4 x half> %a) #4
	ret <4 x half> %0			ret <4 x half> %0
	}			}

	define dso_local <8 x half> @t_vfmaq_n_f16(<8 x half> %a, <8 x half> %b, half %c) {			define dso_local <8 x half> @t_vfmaq_n_f16(<8 x half> %a, <8 x half> %b, half %c) {
	; CHECK-LABEL: t_vfmaq_n_f16:			; CHECK-LABEL: t_vfmaq_n_f16:
	; CHECK: .Lt_vfmaq_n_f16$local:			; CHECK: .Lt_vfmaq_n_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2			; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2
	; CHECK-NEXT: dup v2.8h, v2.h[0]			; CHECK-NEXT: fmla v0.8h, v1.8h, v2.h[0]
	; CHECK-NEXT: fmla v0.8h, v2.8h, v1.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%vecinit = insertelement <8 x half> undef, half %c, i32 0			%vecinit = insertelement <8 x half> undef, half %c, i32 0
	%vecinit7 = shufflevector <8 x half> %vecinit, <8 x half> undef, <8 x i32> zeroinitializer			%vecinit7 = shufflevector <8 x half> %vecinit, <8 x half> undef, <8 x i32> zeroinitializer
	%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> %vecinit7, <8 x half> %a) #4			%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %b, <8 x half> %vecinit7, <8 x half> %a) #4
	ret <8 x half> %0			ret <8 x half> %0
	}			}

	define dso_local half @t_vfmah_lane_f16(half %a, half %b, <4 x half> %c, i32 %lane) {			define dso_local half @t_vfmah_lane_f16(half %a, half %b, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmah_lane_f16:			; CHECK-LABEL: t_vfmah_lane_f16:
	; CHECK: .Lt_vfmah_lane_f16$local:			; CHECK: .Lt_vfmah_lane_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: fmadd h0, h1, h2, h0			; CHECK-NEXT: fmla h0, h1, v2.h[0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%extract = extractelement <4 x half> %c, i32 0			%extract = extractelement <4 x half> %c, i32 0
	%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)			%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)
	ret half %0			ret half %0
	}			}

	define dso_local half @t_vfmah_laneq_f16(half %a, half %b, <8 x half> %c, i32 %lane) {			define dso_local half @t_vfmah_laneq_f16(half %a, half %b, <8 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmah_laneq_f16:			; CHECK-LABEL: t_vfmah_laneq_f16:
	; CHECK: .Lt_vfmah_laneq_f16$local:			; CHECK: .Lt_vfmah_laneq_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: fmadd h0, h1, h2, h0			; CHECK-NEXT: fmla h0, h1, v2.h[0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%extract = extractelement <8 x half> %c, i32 0			%extract = extractelement <8 x half> %c, i32 0
	%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)			%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)
	ret half %0			ret half %0
	}			}

	define dso_local <4 x half> @t_vfms_lane_f16(<4 x half> %a, <4 x half> %b, <4 x half> %c, i32 %lane) {			define dso_local <4 x half> @t_vfms_lane_f16(<4 x half> %a, <4 x half> %b, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfms_lane_f16:			; CHECK-LABEL: t_vfms_lane_f16:
	; CHECK: .Lt_vfms_lane_f16$local:			; CHECK: .Lt_vfms_lane_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: fneg v1.4h, v1.4h			; CHECK-NEXT: fmls v0.4h, v1.4h, v2.h[0]
	; CHECK-NEXT: dup v2.4h, v2.h[0]
	; CHECK-NEXT: fmla v0.4h, v2.4h, v1.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b			%sub = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
	%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer			%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <4 x i32> zeroinitializer
	%fmla3 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %sub, <4 x half> %lane1, <4 x half> %a)			%fmla3 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %sub, <4 x half> %lane1, <4 x half> %a)
	ret <4 x half> %fmla3			ret <4 x half> %fmla3
	}			}

	define dso_local <8 x half> @t_vfmsq_lane_f16(<8 x half> %a, <8 x half> %b, <4 x half> %c, i32 %lane) {			define dso_local <8 x half> @t_vfmsq_lane_f16(<8 x half> %a, <8 x half> %b, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmsq_lane_f16:			; CHECK-LABEL: t_vfmsq_lane_f16:
	; CHECK: .Lt_vfmsq_lane_f16$local:			; CHECK: .Lt_vfmsq_lane_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: fneg v1.8h, v1.8h			; CHECK-NEXT: fmls v0.8h, v1.8h, v2.h[0]
	; CHECK-NEXT: dup v2.8h, v2.h[0]
	; CHECK-NEXT: fmla v0.8h, v2.8h, v1.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b			%sub = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
	%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer			%lane1 = shufflevector <4 x half> %c, <4 x half> undef, <8 x i32> zeroinitializer
	%fmla3 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %sub, <8 x half> %lane1, <8 x half> %a)			%fmla3 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %sub, <8 x half> %lane1, <8 x half> %a)
	ret <8 x half> %fmla3			ret <8 x half> %fmla3
	}			}

	define dso_local <4 x half> @t_vfms_laneq_f16(<4 x half> %a, <4 x half> %b, <8 x half> %c, i32 %lane) {			define dso_local <4 x half> @t_vfms_laneq_f16(<4 x half> %a, <4 x half> %b, <8 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfms_laneq_f16:			; CHECK-LABEL: t_vfms_laneq_f16:
	; CHECK: .Lt_vfms_laneq_f16$local:			; CHECK: .Lt_vfms_laneq_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: dup v2.4h, v2.h[0]			; CHECK-NEXT: fmls v0.4h, v1.4h, v2.h[0]
	; CHECK-NEXT: fmls v0.4h, v2.4h, v1.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b			%sub = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
	%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer			%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <4 x i32> zeroinitializer
	%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %lane1, <4 x half> %sub, <4 x half> %a)			%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %lane1, <4 x half> %sub, <4 x half> %a)
	ret <4 x half> %0			ret <4 x half> %0
	}			}

	define dso_local <8 x half> @t_vfmsq_laneq_f16(<8 x half> %a, <8 x half> %b, <8 x half> %c, i32 %lane) {			define dso_local <8 x half> @t_vfmsq_laneq_f16(<8 x half> %a, <8 x half> %b, <8 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmsq_laneq_f16:			; CHECK-LABEL: t_vfmsq_laneq_f16:
	; CHECK: .Lt_vfmsq_laneq_f16$local:			; CHECK: .Lt_vfmsq_laneq_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: dup v2.8h, v2.h[0]			; CHECK-NEXT: fmls v0.8h, v1.8h, v2.h[0]
	; CHECK-NEXT: fmls v0.8h, v2.8h, v1.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b			%sub = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
	%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer			%lane1 = shufflevector <8 x half> %c, <8 x half> undef, <8 x i32> zeroinitializer
	%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %lane1, <8 x half> %sub, <8 x half> %a)			%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %lane1, <8 x half> %sub, <8 x half> %a)
	ret <8 x half> %0			ret <8 x half> %0
	}			}

	define dso_local <4 x half> @t_vfms_n_f16(<4 x half> %a, <4 x half> %b, half %c) {			define dso_local <4 x half> @t_vfms_n_f16(<4 x half> %a, <4 x half> %b, half %c) {
	; CHECK-LABEL: t_vfms_n_f16:			; CHECK-LABEL: t_vfms_n_f16:
	; CHECK: .Lt_vfms_n_f16$local:			; CHECK: .Lt_vfms_n_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2			; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2
	; CHECK-NEXT: fneg v1.4h, v1.4h			; CHECK-NEXT: fmls v0.4h, v1.4h, v2.h[0]
	; CHECK-NEXT: dup v2.4h, v2.h[0]
	; CHECK-NEXT: fmla v0.4h, v2.4h, v1.4h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b			%sub = fsub <4 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
	%vecinit = insertelement <4 x half> undef, half %c, i32 0			%vecinit = insertelement <4 x half> undef, half %c, i32 0
	%vecinit3 = shufflevector <4 x half> %vecinit, <4 x half> undef, <4 x i32> zeroinitializer			%vecinit3 = shufflevector <4 x half> %vecinit, <4 x half> undef, <4 x i32> zeroinitializer
	%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %sub, <4 x half> %vecinit3, <4 x half> %a) #4			%0 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> %sub, <4 x half> %vecinit3, <4 x half> %a) #4
	ret <4 x half> %0			ret <4 x half> %0
	}			}

	define dso_local <8 x half> @t_vfmsq_n_f16(<8 x half> %a, <8 x half> %b, half %c) {			define dso_local <8 x half> @t_vfmsq_n_f16(<8 x half> %a, <8 x half> %b, half %c) {
	; CHECK-LABEL: t_vfmsq_n_f16:			; CHECK-LABEL: t_vfmsq_n_f16:
	; CHECK: .Lt_vfmsq_n_f16$local:			; CHECK: .Lt_vfmsq_n_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2			; CHECK-NEXT: // kill: def $h2 killed $h2 def $q2
	; CHECK-NEXT: fneg v1.8h, v1.8h			; CHECK-NEXT: fmls v0.8h, v1.8h, v2.h[0]
	; CHECK-NEXT: dup v2.8h, v2.h[0]
	; CHECK-NEXT: fmla v0.8h, v2.8h, v1.8h
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%sub = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b			%sub = fsub <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>, %b
	%vecinit = insertelement <8 x half> undef, half %c, i32 0			%vecinit = insertelement <8 x half> undef, half %c, i32 0
	%vecinit7 = shufflevector <8 x half> %vecinit, <8 x half> undef, <8 x i32> zeroinitializer			%vecinit7 = shufflevector <8 x half> %vecinit, <8 x half> undef, <8 x i32> zeroinitializer
	%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %sub, <8 x half> %vecinit7, <8 x half> %a) #4			%0 = tail call <8 x half> @llvm.fma.v8f16(<8 x half> %sub, <8 x half> %vecinit7, <8 x half> %a) #4
	ret <8 x half> %0			ret <8 x half> %0
	}			}

	define dso_local half @t_vfmsh_lane_f16(half %a, half %b, <4 x half> %c, i32 %lane) {			define dso_local half @t_vfmsh_lane_f16(half %a, half %b, <4 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmsh_lane_f16:			; CHECK-LABEL: t_vfmsh_lane_f16:
	; CHECK: .Lt_vfmsh_lane_f16$local:			; CHECK: .Lt_vfmsh_lane_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: fmsub h0, h1, h2, h0			; CHECK-NEXT: fmls h0, h1, v2.h[0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = fsub half 0xH8000, %b			%0 = fsub half 0xH8000, %b
	%extract = extractelement <4 x half> %c, i32 0			%extract = extractelement <4 x half> %c, i32 0
	%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)			%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)
	ret half %1			ret half %1
	}			}

	define dso_local half @t_vfmsh_laneq_f16(half %a, half %b, <8 x half> %c, i32 %lane) {			define dso_local half @t_vfmsh_laneq_f16(half %a, half %b, <8 x half> %c, i32 %lane) {
	; CHECK-LABEL: t_vfmsh_laneq_f16:			; CHECK-LABEL: t_vfmsh_laneq_f16:
	; CHECK: .Lt_vfmsh_laneq_f16$local:			; CHECK: .Lt_vfmsh_laneq_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: fmsub h0, h1, h2, h0			; CHECK-NEXT: fmls h0, h1, v2.h[0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = fsub half 0xH8000, %b			%0 = fsub half 0xH8000, %b
	%extract = extractelement <8 x half> %c, i32 0			%extract = extractelement <8 x half> %c, i32 0
	%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)			%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)
	ret half %1			ret half %1
	}			}

	▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	}			}

	define dso_local half @t_vfmah_lane3_f16(half %a, half %b, <4 x half> %c) {			define dso_local half @t_vfmah_lane3_f16(half %a, half %b, <4 x half> %c) {
	; CHECK-LABEL: t_vfmah_lane3_f16:			; CHECK-LABEL: t_vfmah_lane3_f16:
	; CHECK: .Lt_vfmah_lane3_f16$local:			; CHECK: .Lt_vfmah_lane3_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: mov h2, v2.h[3]			; CHECK-NEXT: fmla h0, h1, v2.h[3]
	; CHECK-NEXT: fmadd h0, h1, h2, h0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%extract = extractelement <4 x half> %c, i32 3			%extract = extractelement <4 x half> %c, i32 3
	%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)			%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)
	ret half %0			ret half %0
	}			}

	define dso_local half @t_vfmah_laneq7_f16(half %a, half %b, <8 x half> %c) {			define dso_local half @t_vfmah_laneq7_f16(half %a, half %b, <8 x half> %c) {
	; CHECK-LABEL: t_vfmah_laneq7_f16:			; CHECK-LABEL: t_vfmah_laneq7_f16:
	; CHECK: .Lt_vfmah_laneq7_f16$local:			; CHECK: .Lt_vfmah_laneq7_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: mov h2, v2.h[7]			; CHECK-NEXT: fmla h0, h1, v2.h[7]
	; CHECK-NEXT: fmadd h0, h1, h2, h0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%extract = extractelement <8 x half> %c, i32 7			%extract = extractelement <8 x half> %c, i32 7
	%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)			%0 = tail call half @llvm.fma.f16(half %b, half %extract, half %a)
	ret half %0			ret half %0
	}			}

	define dso_local half @t_vfmsh_lane3_f16(half %a, half %b, <4 x half> %c) {			define dso_local half @t_vfmsh_lane3_f16(half %a, half %b, <4 x half> %c) {
	; CHECK-LABEL: t_vfmsh_lane3_f16:			; CHECK-LABEL: t_vfmsh_lane3_f16:
	; CHECK: .Lt_vfmsh_lane3_f16$local:			; CHECK: .Lt_vfmsh_lane3_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2			; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
	; CHECK-NEXT: mov h2, v2.h[3]			; CHECK-NEXT: fmls h0, h1, v2.h[3]
	; CHECK-NEXT: fmsub h0, h1, h2, h0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = fsub half 0xH8000, %b			%0 = fsub half 0xH8000, %b
	%extract = extractelement <4 x half> %c, i32 3			%extract = extractelement <4 x half> %c, i32 3
	%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)			%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)
	ret half %1			ret half %1
	}			}

	define dso_local half @t_vfmsh_laneq7_f16(half %a, half %b, <8 x half> %c) {			define dso_local half @t_vfmsh_laneq7_f16(half %a, half %b, <8 x half> %c) {
	; CHECK-LABEL: t_vfmsh_laneq7_f16:			; CHECK-LABEL: t_vfmsh_laneq7_f16:
	; CHECK: .Lt_vfmsh_laneq7_f16$local:			; CHECK: .Lt_vfmsh_laneq7_f16$local:
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: // %bb.0: // %entry			; CHECK-NEXT: // %bb.0: // %entry
	; CHECK-NEXT: mov h2, v2.h[7]			; CHECK-NEXT: fmls h0, h1, v2.h[7]
	; CHECK-NEXT: fmsub h0, h1, h2, h0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%0 = fsub half 0xH8000, %b			%0 = fsub half 0xH8000, %b
	%extract = extractelement <8 x half> %c, i32 7			%extract = extractelement <8 x half> %c, i32 7
	%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)			%1 = tail call half @llvm.fma.f16(half %0, half %extract, half %a)
	ret half %1			ret half %1
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] FMLA/FMLS patterns improvement.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 258865

clang/test/CodeGen/aarch64-v8.2a-neon-intrinsics-constrained.c

llvm/lib/Target/AArch64/AArch64InstrFormats.td

llvm/test/CodeGen/AArch64/fp16_intrinsic_lane.ll

[AArch64] FMLA/FMLS patterns improvement.
ClosedPublic