This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
3/3
arm_sve.td
-
test/CodeGen/aarch64-sve-intrinsics/
-
CodeGen/
-
aarch64-sve-intrinsics/
1/1
acle_sve_bfdot.c
-
acle_sve_bfmlalb.c
-
acle_sve_bfmlalt.c
-
acle_sve_bfmmla.c
-
acle_sve_cvt-bfloat.c
-
acle_sve_cvtnt.c
-
utils/TableGen/
-
TableGen/
-
SveEmitter.cpp
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
3/5
IntrinsicsAArch64.td
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64SVEInstrInfo.td
-
SVEInstrFormats.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-intrinsics-bfloat.ll

Differential D82141

[sve][acle] Add SVE BFloat16 extensions.
ClosedPublic

Authored by fpetrogalli on Jun 18 2020, 8:22 PM.

Download Raw Diff

Details

Reviewers

sdesmalen
ctetreau
efriedma
david-arm
rengolin

Commits

rGef597eda8efc: [sve][acle] Add SVE BFloat16 extensions.

Summary

List of intrinsics:

svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfdot[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfdot_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmmla[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)

svfloat32_t svbfmlalb[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalb[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalb_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmlalt[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalt[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalt_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svbfloat16_t svcvt_bf16[_f32]_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_x(svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_z(svbool_t pg, svfloat32_t op)

svbfloat16_t svcvtnt_bf16[_f32]_m(svbfloat16_t even, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvtnt_bf16[_f32]_x(svbfloat16_t even, svbool_t pg, svfloat32_t op)

For reference, see section 7.2 of "Arm C Language Extensions for SVE - Version 00bet4"

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fpetrogalli created this revision.Jun 18 2020, 8:22 PM

Herald added a reviewer: rengolin. · View Herald TranscriptJun 18 2020, 8:22 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, cfe-commits, psnobl and 4 others. · View Herald Transcript

Harbormaster completed remote builds in B60931: Diff 271905.Jun 18 2020, 10:20 PM

sdesmalen added inline comments.Jun 19 2020, 2:01 AM

clang/include/clang/Basic/arm_sve.td
502	The types for these intrinsics are always `svfloat32_t` and `svbfloat16_t`, which given their semantics is unlikely to ever be extended to other types, so it's easier to make the LLVM IR non-overloaded (i.e. hardcoding `llvm_nxv4f32_ty` and `llvm_nxv8bf16_ty`) and using the `IsOverloadNone` flag for these builtins. Then you can express this builtin as: def SVBFDOT: SInst<"svbfdot[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfdot">; and drop the need for the `$` modifier.
506	Similar to the suggestion above to use `"MMdd"` for SVBFDOT, this could use `"MMda"` and you don't need the `~` modifier. nit: add whitespace above this line. nit: the rest of this file tries to align the columns, that makes this file a bit easier to read.
1050	nit: redundant comment (same for above)
clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
28	Testing the edge cases 0 and 3 should be sufficient. (same for all other cases in this patch)
llvm/include/llvm/IR/IntrinsicsAArch64.td
1343	nit: `SVE_bfloat` is not very descriptive, maybe use `SVE_4Vec_BF16` and `SVE_4Vec_BF16_Indexed`?
1806	nit: use `fcvtbf` instead of `cvt` => `int_aarch64_sve_fcvtbf_bf16f32` ?

sdesmalen added subscribers: c-rhodes, kmclaughlin.Jun 19 2020, 2:05 AM

Thank you for the review @sdesmalen!

Francesco

llvm/include/llvm/IR/IntrinsicsAArch64.td
1806	Renamed to `int_aarch64_sve_fcvt_bf16f32` and `int_aarch64_sve_fcvtnt_bf16f32` respectively, because I think it wouldn't make sense to add the `bf` suffix to the `cvtnt` version of the intrinsic.

Harbormaster completed remote builds in B61094: Diff 272170.Jun 19 2020, 3:14 PM

LGTM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1806	I meant to write `int_aarch64_sve_bfcvt_bf16f32`. This seems consistent with all other intrinsics (`fcvt`, `fcvtzu`, `scvtf`, etc.) that use the name of the instruction directly in the name of the intrinsic.

This revision is now accepted and ready to land.Jun 22 2020, 12:35 AM

sdesmalen added inline comments.Jun 22 2020, 12:36 AM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1345	nit: keep this on one line.

Formatting changes. NFC.

Closed by commit rGef597eda8efc: [sve][acle] Add SVE BFloat16 extensions. (authored by fpetrogalli). · Explain WhyJun 22 2020, 10:13 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B61262: Diff 272480!Jun 22 2020, 10:14 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

arm_sve.td

19 lines

test/

CodeGen/

aarch64-sve-intrinsics/

40 lines

40 lines

40 lines

18 lines

acle_sve_cvt-bfloat.c

35 lines

acle_sve_cvtnt.c

27 lines

utils/

TableGen/

SveEmitter.cpp

7 lines

llvm/

include/

llvm/

IR/

IntrinsicsAArch64.td

26 lines

lib/

Target/

AArch64/

AArch64SVEInstrInfo.td

18 lines

SVEInstrFormats.td

30 lines

test/

CodeGen/

AArch64/

sve-intrinsics-bfloat.ll

243 lines

Diff 272480

clang/include/clang/Basic/arm_sve.td

	Show First 20 Lines • Show All 491 Lines • ▼ Show 20 Lines

	// Load one octoword and replicate (scalar base)			// Load one octoword and replicate (scalar base)
	let ArchGuard = "defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in {			let ArchGuard = "defined(__ARM_FEATURE_SVE_MATMUL_FP64)" in {
	def SVLD1RO : SInst<"svld1ro[_{2}]", "dPc", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_ld1ro">;			def SVLD1RO : SInst<"svld1ro[_{2}]", "dPc", "csilUcUsUiUlhfd", MergeNone, "aarch64_sve_ld1ro">;
	}			}
	let ArchGuard = "defined(__ARM_FEATURE_SVE_MATMUL_FP64) && defined(__ARM_FEATURE_BF16_SCALAR_ARITHMETIC)" in {			let ArchGuard = "defined(__ARM_FEATURE_SVE_MATMUL_FP64) && defined(__ARM_FEATURE_BF16_SCALAR_ARITHMETIC)" in {
	def SVLD1RO_BF : SInst<"svld1ro[_{2}]", "dPc", "b", MergeNone, "aarch64_sve_ld1ro">;			def SVLD1RO_BF : SInst<"svld1ro[_{2}]", "dPc", "b", MergeNone, "aarch64_sve_ld1ro">;
	}			}

				let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
				def SVBFDOT : SInst<"svbfdot[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfdot", [IsOverloadNone]>;
				sdesmalenUnsubmitted Done Reply Inline Actions The types for these intrinsics are always `svfloat32_t` and `svbfloat16_t`, which given their semantics is unlikely to ever be extended to other types, so it's easier to make the LLVM IR non-overloaded (i.e. hardcoding `llvm_nxv4f32_ty` and `llvm_nxv8bf16_ty`) and using the `IsOverloadNone` flag for these builtins. Then you can express this builtin as: def SVBFDOT: SInst<"svbfdot[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfdot">; and drop the need for the `$` modifier. sdesmalen: The types for these intrinsics are always `svfloat32_t` and `svbfloat16_t`, which given their…
				def SVBFMLALB : SInst<"svbfmlalb[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmlalb", [IsOverloadNone]>;
				def SVBFMLALT : SInst<"svbfmlalt[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmlalt", [IsOverloadNone]>;
				def SVBFMMLA : SInst<"svbfmmla[_{0}]", "MMdd", "b", MergeNone, "aarch64_sve_bfmmla", [IsOverloadNone]>;
				def SVBFDOT_N : SInst<"svbfdot[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfdot", [IsOverloadNone]>;
				sdesmalenUnsubmitted Done Reply Inline Actions Similar to the suggestion above to use `"MMdd"` for SVBFDOT, this could use `"MMda"` and you don't need the `~` modifier. nit: add whitespace above this line. nit: the rest of this file tries to align the columns, that makes this file a bit easier to read. sdesmalen: Similar to the suggestion above to use `"MMdd"` for SVBFDOT, this could use `"MMda"` and you…
				def SVBFMLAL_N : SInst<"svbfmlalb[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfmlalb", [IsOverloadNone]>;
				def SVBFMLALT_N : SInst<"svbfmlalt[_n_{0}]", "MMda", "b", MergeNone, "aarch64_sve_bfmlalt", [IsOverloadNone]>;
				def SVBFDOT_LANE : SInst<"svbfdot_lane[_{0}]", "MMddn", "b", MergeNone, "aarch64_sve_bfdot_lane", [IsOverloadNone], [ImmCheck<3, ImmCheck0_3>]>;
				def SVBFMLALB_LANE : SInst<"svbfmlalb_lane[_{0}]", "MMddn", "b", MergeNone, "aarch64_sve_bfmlalb_lane", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
				def SVBFMLALT_LANE : SInst<"svbfmlalt_lane[_{0}]", "MMddn", "b", MergeNone, "aarch64_sve_bfmlalt_lane", [IsOverloadNone], [ImmCheck<3, ImmCheck0_7>]>;
				}

	////////////////////////////////////////////////////////////////////////////////			////////////////////////////////////////////////////////////////////////////////
	// Stores			// Stores

	// Store one vector (scalar base)			// Store one vector (scalar base)
	def SVST1 : MInst<"svst1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore], MemEltTyDefault, "aarch64_sve_st1">;			def SVST1 : MInst<"svst1[_{d}]", "vPpd", "csilUcUsUiUlhfd", [IsStore], MemEltTyDefault, "aarch64_sve_st1">;
	def SVST1B_S : MInst<"svst1b[_{d}]", "vPAd", "sil", [IsStore], MemEltTyInt8, "aarch64_sve_st1">;			def SVST1B_S : MInst<"svst1b[_{d}]", "vPAd", "sil", [IsStore], MemEltTyInt8, "aarch64_sve_st1">;
	def SVST1B_U : MInst<"svst1b[_{d}]", "vPEd", "UsUiUl", [IsStore], MemEltTyInt8, "aarch64_sve_st1">;			def SVST1B_U : MInst<"svst1b[_{d}]", "vPEd", "UsUiUl", [IsStore], MemEltTyInt8, "aarch64_sve_st1">;
	def SVST1H_S : MInst<"svst1h[_{d}]", "vPBd", "il", [IsStore], MemEltTyInt16, "aarch64_sve_st1">;			def SVST1H_S : MInst<"svst1h[_{d}]", "vPBd", "il", [IsStore], MemEltTyInt16, "aarch64_sve_st1">;
	▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines
	defm SVFCVTZS_S16_F16 : SInstCvtMXZ<"svcvt_s16[_f16]", "ddPO", "dPO", "s", "aarch64_sve_fcvtzs", [IsOverloadCvt]>;			defm SVFCVTZS_S16_F16 : SInstCvtMXZ<"svcvt_s16[_f16]", "ddPO", "dPO", "s", "aarch64_sve_fcvtzs", [IsOverloadCvt]>;
	defm SVFCVTZS_S32_F16 : SInstCvtMXZ<"svcvt_s32[_f16]", "ddPO", "dPO", "i", "aarch64_sve_fcvtzs_i32f16">;			defm SVFCVTZS_S32_F16 : SInstCvtMXZ<"svcvt_s32[_f16]", "ddPO", "dPO", "i", "aarch64_sve_fcvtzs_i32f16">;
	defm SVFCVTZS_S64_F16 : SInstCvtMXZ<"svcvt_s64[_f16]", "ddPO", "dPO", "l", "aarch64_sve_fcvtzs_i64f16">;			defm SVFCVTZS_S64_F16 : SInstCvtMXZ<"svcvt_s64[_f16]", "ddPO", "dPO", "l", "aarch64_sve_fcvtzs_i64f16">;

	// svcvt_s##_f32			// svcvt_s##_f32
	defm SVFCVTZS_S32_F32 : SInstCvtMXZ<"svcvt_s32[_f32]", "ddPM", "dPM", "i", "aarch64_sve_fcvtzs", [IsOverloadCvt]>;			defm SVFCVTZS_S32_F32 : SInstCvtMXZ<"svcvt_s32[_f32]", "ddPM", "dPM", "i", "aarch64_sve_fcvtzs", [IsOverloadCvt]>;
	defm SVFCVTZS_S64_F32 : SInstCvtMXZ<"svcvt_s64[_f32]", "ddPM", "dPM", "l", "aarch64_sve_fcvtzs_i64f32">;			defm SVFCVTZS_S64_F32 : SInstCvtMXZ<"svcvt_s64[_f32]", "ddPM", "dPM", "l", "aarch64_sve_fcvtzs_i64f32">;

				let ArchGuard = "defined(__ARM_FEATURE_SVE_BF16)" in {
				defm SVCVT_BF16_F32 : SInstCvtMXZ<"svcvt_bf16[_f32]", "ddPM", "dPM", "b", "aarch64_sve_fcvt_bf16f32">;
				defm SVCVTNT_BF16_F32 : SInstCvtMX<"svcvtnt_bf16[_f32]", "ddPM", "dPM", "b", "aarch64_sve_fcvtnt_bf16f32">;
				}
				sdesmalenUnsubmitted Done Reply Inline Actions nit: redundant comment (same for above) sdesmalen: nit: redundant comment (same for above)

	// svcvt_s##_f64			// svcvt_s##_f64
	defm SVFCVTZS_S32_F64 : SInstCvtMXZ<"svcvt_s32[_f64]", "ttPd", "tPd", "d", "aarch64_sve_fcvtzs_i32f64">;			defm SVFCVTZS_S32_F64 : SInstCvtMXZ<"svcvt_s32[_f64]", "ttPd", "tPd", "d", "aarch64_sve_fcvtzs_i32f64">;
	defm SVFCVTZS_S64_F64 : SInstCvtMXZ<"svcvt_s64[_f64]", "ddPN", "dPN", "l", "aarch64_sve_fcvtzs", [IsOverloadCvt]>;			defm SVFCVTZS_S64_F64 : SInstCvtMXZ<"svcvt_s64[_f64]", "ddPN", "dPN", "l", "aarch64_sve_fcvtzs", [IsOverloadCvt]>;

	// svcvt_u##_f16			// svcvt_u##_f16
	defm SVFCVTZU_U16_F16 : SInstCvtMXZ<"svcvt_u16[_f16]", "ddPO", "dPO", "Us", "aarch64_sve_fcvtzu", [IsOverloadCvt]>;			defm SVFCVTZU_U16_F16 : SInstCvtMXZ<"svcvt_u16[_f16]", "ddPO", "dPO", "Us", "aarch64_sve_fcvtzu", [IsOverloadCvt]>;
	defm SVFCVTZU_U32_F16 : SInstCvtMXZ<"svcvt_u32[_f16]", "ddPO", "dPO", "Ui", "aarch64_sve_fcvtzu_i32f16">;			defm SVFCVTZU_U32_F16 : SInstCvtMXZ<"svcvt_u32[_f16]", "ddPO", "dPO", "Ui", "aarch64_sve_fcvtzu_i32f16">;
	defm SVFCVTZU_U64_F16 : SInstCvtMXZ<"svcvt_u64[_f16]", "ddPO", "dPO", "Ul", "aarch64_sve_fcvtzu_i64f16">;			defm SVFCVTZU_U64_F16 : SInstCvtMXZ<"svcvt_u64[_f16]", "ddPO", "dPO", "Ul", "aarch64_sve_fcvtzu_i64f16">;
	▲ Show 20 Lines • Show All 883 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c

This file was added.

				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s

				#include <arm_sve.h>

				#ifdef SVE_OVERLOADED_FORMS
				// A simple used,unused... macro, long enough to represent any SVE builtin.
				#define SVE_ACLE_FUNC(A1, A2_UNUSED, A3, A4_UNUSED) A1##A3
				#else
				#define SVE_ACLE_FUNC(A1, A2, A3, A4) A1##A2##A3##A4
				#endif

				svfloat32_t test_bfdot_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfdot_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfdot, _f32, , )(x, y, z);
				}

				svfloat32_t test_bfdot_lane_0_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfdot_lane_0_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z, i64 0)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfdot_lane, _f32, , )(x, y, z, 0);
				}

				svfloat32_t test_bfdot_lane_3_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfdot_lane_3_f32(
				sdesmalenUnsubmitted Done Reply Inline Actions Testing the edge cases 0 and 3 should be sufficient. (same for all other cases in this patch) sdesmalen: Testing the edge cases 0 and 3 should be sufficient. (same for all other cases in this patch)
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z, i64 3)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfdot_lane, _f32, , )(x, y, z, 3);
				}

				svfloat32_t test_bfdot_n_f32(svfloat32_t x, svbfloat16_t y, bfloat16_t z) {
				// CHECK-LABEL: @test_bfdot_n_f32(
				// CHECK: %[[SPLAT:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.dup.x.nxv8bf16(bfloat %z)
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %[[SPLAT]])
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfdot, _n_f32, , )(x, y, z);
				}

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c

This file was added.

				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s

				#include <arm_sve.h>

				#ifdef SVE_OVERLOADED_FORMS
				// A simple used,unused... macro, long enough to represent any SVE builtin.
				#define SVE_ACLE_FUNC(A1, A2_UNUSED, A3, A4_UNUSED) A1##A3
				#else
				#define SVE_ACLE_FUNC(A1, A2, A3, A4) A1##A2##A3##A4
				#endif

				svfloat32_t test_svbfmlalb_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_svbfmlalb_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalb, _f32, , )(x, y, z);
				}

				svfloat32_t test_bfmlalb_lane_0_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfmlalb_lane_0_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z, i64 0)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalb_lane, _f32, , )(x, y, z, 0);
				}

				svfloat32_t test_bfmlalb_lane_7_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfmlalb_lane_7_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z, i64 7)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalb_lane, _f32, , )(x, y, z, 7);
				}

				svfloat32_t test_bfmlalb_n_f32(svfloat32_t x, svbfloat16_t y, bfloat16_t z) {
				// CHECK-LABEL: @test_bfmlalb_n_f32(
				// CHECK: %[[SPLAT:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.dup.x.nxv8bf16(bfloat %z)
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %[[SPLAT]])
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalb, _n_f32, , )(x, y, z);
				}

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c

This file was added.

				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s

				#include <arm_sve.h>

				#ifdef SVE_OVERLOADED_FORMS
				// A simple used,unused... macro, long enough to represent any SVE builtin.
				#define SVE_ACLE_FUNC(A1, A2_UNUSED, A3, A4_UNUSED) A1##A3
				#else
				#define SVE_ACLE_FUNC(A1, A2, A3, A4) A1##A2##A3##A4
				#endif

				svfloat32_t test_svbfmlalt_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_svbfmlalt_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalt, _f32, , )(x, y, z);
				}

				svfloat32_t test_bfmlalt_lane_0_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfmlalt_lane_0_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z, i64 0)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalt_lane, _f32, , )(x, y, z, 0);
				}

				svfloat32_t test_bfmlalt_lane_7_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfmlalt_lane_7_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z, i64 7)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalt_lane, _f32, , )(x, y, z, 7);
				}

				svfloat32_t test_bfmlalt_n_f32(svfloat32_t x, svbfloat16_t y, bfloat16_t z) {
				// CHECK-LABEL: @test_bfmlalt_n_f32(
				// CHECK: %[[SPLAT:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.dup.x.nxv8bf16(bfloat %z)
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %[[SPLAT]])
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmlalt, _n_f32, , )(x, y, z);
				}

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmmla.c

This file was added.

				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s

				#include <arm_sve.h>

				#ifdef SVE_OVERLOADED_FORMS
				// A simple used,unused... macro, long enough to represent any SVE builtin.
				#define SVE_ACLE_FUNC(A1, A2_UNUSED, A3, A4_UNUSED) A1##A3
				#else
				#define SVE_ACLE_FUNC(A1, A2, A3, A4) A1##A2##A3##A4
				#endif

				svfloat32_t test_bfmmla_f32(svfloat32_t x, svbfloat16_t y, svbfloat16_t z) {
				// CHECK-LABEL: @test_bfmmla_f32(
				// CHECK: %[[RET:.*]] = call <vscale x 4 x float> @llvm.aarch64.sve.bfmmla(<vscale x 4 x float> %x, <vscale x 8 x bfloat> %y, <vscale x 8 x bfloat> %z)
				// CHECK: ret <vscale x 4 x float> %[[RET]]
				return SVE_ACLE_FUNC(svbfmmla, _f32, , )(x, y, z);
				}

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cvt-bfloat.c

This file was added.

				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s

				#include <arm_sve.h>

				#ifdef SVE_OVERLOADED_FORMS
				// A simple used,unused... macro, long enough to represent any SVE builtin.
				#define SVE_ACLE_FUNC(A1, A2_UNUSED, A3, A4_UNUSED) A1##A3
				#else
				#define SVE_ACLE_FUNC(A1, A2, A3, A4) A1##A2##A3##A4
				#endif

				svbfloat16_t test_svcvt_bf16_f32_x(svbool_t pg, svfloat32_t op) {
				// CHECK-LABEL: test_svcvt_bf16_f32_x
				// CHECK: %[[PG:.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %pg)
				// CHECK: %[[INTRINSIC:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvt.bf16f32(<vscale x 8 x bfloat> undef, <vscale x 8 x i1> %[[PG]], <vscale x 4 x float> %op)
				// CHECK: ret <vscale x 8 x bfloat> %[[INTRINSIC]]
				return SVE_ACLE_FUNC(svcvt_bf16, _f32, _x, )(pg, op);
				}

				svbfloat16_t test_svcvt_bf16_f32_z(svbool_t pg, svfloat32_t op) {
				// CHECK-LABEL: test_svcvt_bf16_f32_z
				// CHECK: %[[PG:.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %pg)
				// CHECK: %[[INTRINSIC:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvt.bf16f32(<vscale x 8 x bfloat> zeroinitializer, <vscale x 8 x i1> %[[PG]], <vscale x 4 x float> %op)
				// CHECK: ret <vscale x 8 x bfloat> %[[INTRINSIC]]
				return SVE_ACLE_FUNC(svcvt_bf16, _f32, _z, )(pg, op);
				}

				svbfloat16_t test_svcvt_bf16_f32_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op) {
				// CHECK-LABEL: test_svcvt_bf16_f32_m
				// CHECK: %[[PG:.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %pg)
				// CHECK: %[[INTRINSIC:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvt.bf16f32(<vscale x 8 x bfloat> %inactive, <vscale x 8 x i1> %[[PG]], <vscale x 4 x float> %op)
				// CHECK: ret <vscale x 8 x bfloat> %[[INTRINSIC]]
				return SVE_ACLE_FUNC(svcvt_bf16, _f32, _m, )(inactive, pg, op);
				}

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cvtnt.c

This file was added.

				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -D__ARM_FEATURE_SVE -D__ARM_FEATURE_SVE_BF16 -D__ARM_FEATURE_BF16_SCALAR_ARITHMETIC -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +bf16 -fallow-half-arguments-and-returns -S -O1 -Werror -Wall -emit-llvm -o - %s \| FileCheck %s

				#include <arm_sve.h>

				#ifdef SVE_OVERLOADED_FORMS
				// A simple used,unused... macro, long enough to represent any SVE builtin.
				#define SVE_ACLE_FUNC(A1, A2_UNUSED, A3, A4_UNUSED) A1##A3
				#else
				#define SVE_ACLE_FUNC(A1, A2, A3, A4) A1##A2##A3##A4
				#endif

				svbfloat16_t test_svcvtnt_bf16_f32_x(svbool_t pg, svfloat32_t op) {
				// CHECK-LABEL: test_svcvtnt_bf16_f32_x
				// CHECK: %[[PG:.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %pg)
				// CHECK: %[[INTRINSIC:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvtnt.bf16f32(<vscale x 8 x bfloat> undef, <vscale x 8 x i1> %[[PG]], <vscale x 4 x float> %op)
				// CHECK: ret <vscale x 8 x bfloat> %[[INTRINSIC]]
				return SVE_ACLE_FUNC(svcvtnt_bf16, _f32, _x, )(pg, op);
				}

				svbfloat16_t test_svcvtnt_bf16_f32_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op) {
				// CHECK-LABEL: test_svcvtnt_bf16_f32_m
				// CHECK: %[[PG:.*]] = call <vscale x 8 x i1> @llvm.aarch64.sve.convert.from.svbool.nxv8i1(<vscale x 16 x i1> %pg)
				// CHECK: %[[INTRINSIC:.*]] = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvtnt.bf16f32(<vscale x 8 x bfloat> %inactive, <vscale x 8 x i1> %[[PG]], <vscale x 4 x float> %op)
				// CHECK: ret <vscale x 8 x bfloat> %[[INTRINSIC]]
				return SVE_ACLE_FUNC(svcvtnt_bf16, _f32, _m, )(inactive, pg, op);
				}

clang/utils/TableGen/SveEmitter.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	public:
bool isSigned() const { return Signed; }		bool isSigned() const { return Signed; }
bool isImmediate() const { return Immediate; }		bool isImmediate() const { return Immediate; }
bool isScalar() const { return NumVectors == 0; }		bool isScalar() const { return NumVectors == 0; }
bool isVector() const { return NumVectors > 0; }		bool isVector() const { return NumVectors > 0; }
bool isScalableVector() const { return isVector() && IsScalable; }		bool isScalableVector() const { return isVector() && IsScalable; }
bool isChar() const { return ElementBitwidth == 8; }		bool isChar() const { return ElementBitwidth == 8; }
bool isVoid() const { return Void & !Pointer; }		bool isVoid() const { return Void & !Pointer; }
bool isDefault() const { return DefaultType; }		bool isDefault() const { return DefaultType; }
bool isFloat() const { return Float; }		bool isFloat() const { return Float && !BFloat; }
bool isBFloat() const { return BFloat; }		bool isBFloat() const { return BFloat && !Float; }
bool isFloatingPoint() const { return Float \|\| BFloat; }		bool isFloatingPoint() const { return Float \|\| BFloat; }
bool isInteger() const { return !isFloatingPoint() && !Predicate; }		bool isInteger() const { return !isFloatingPoint() && !Predicate; }
bool isScalarPredicate() const {		bool isScalarPredicate() const {
return !isFloatingPoint() && Predicate && NumVectors == 0;		return !isFloatingPoint() && Predicate && NumVectors == 0;
}		}
bool isPredicateVector() const { return Predicate; }		bool isPredicateVector() const { return Predicate; }
bool isPredicatePattern() const { return PredicatePattern; }		bool isPredicatePattern() const { return PredicatePattern; }
bool isPrefetchOp() const { return PrefetchOp; }		bool isPrefetchOp() const { return PrefetchOp; }
▲ Show 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	case 'f':
ElementBitwidth = 32;		ElementBitwidth = 32;
break;		break;
case 'd':		case 'd':
Float = true;		Float = true;
ElementBitwidth = 64;		ElementBitwidth = 64;
break;		break;
case 'b':		case 'b':
BFloat = true;		BFloat = true;
		Float = false;
ElementBitwidth = 16;		ElementBitwidth = 16;
break;		break;
default:		default:
llvm_unreachable("Unhandled type code!");		llvm_unreachable("Unhandled type code!");
}		}
}		}
assert(ElementBitwidth != ~0U && "Bad element bitwidth!");		assert(ElementBitwidth != ~0U && "Bad element bitwidth!");
}		}
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	case 'm':
Float = false;		Float = false;
ElementBitwidth = Bitwidth = 32;		ElementBitwidth = Bitwidth = 32;
NumVectors = 0;		NumVectors = 0;
break;		break;
case 'n':		case 'n':
Predicate = false;		Predicate = false;
Signed = false;		Signed = false;
Float = false;		Float = false;
		BFloat = false;
ElementBitwidth = Bitwidth = 64;		ElementBitwidth = Bitwidth = 64;
NumVectors = 0;		NumVectors = 0;
break;		break;
case 'w':		case 'w':
ElementBitwidth = 64;		ElementBitwidth = 64;
break;		break;
case 'j':		case 'j':
ElementBitwidth = Bitwidth = 64;		ElementBitwidth = Bitwidth = 64;
Show All 22 Lines	void SVEType::applyModifier(char Mod) {
case 'O':		case 'O':
Predicate = false;		Predicate = false;
Float = true;		Float = true;
ElementBitwidth = 16;		ElementBitwidth = 16;
break;		break;
case 'M':		case 'M':
Predicate = false;		Predicate = false;
Float = true;		Float = true;
		BFloat = false;
ElementBitwidth = 32;		ElementBitwidth = 32;
break;		break;
case 'N':		case 'N':
Predicate = false;		Predicate = false;
Float = true;		Float = true;
ElementBitwidth = 64;		ElementBitwidth = 64;
break;		break;
case 'Q':		case 'Q':
▲ Show 20 Lines • Show All 700 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsAArch64.td

Show First 20 Lines • Show All 1,334 Lines • ▼ Show 20 Lines	: Intrinsic<[],
],		],
[IntrInaccessibleMemOrArgMemOnly, ImmArg<ArgIndex<3>>]>;		[IntrInaccessibleMemOrArgMemOnly, ImmArg<ArgIndex<3>>]>;

class SVE_MatMul_Intrinsic		class SVE_MatMul_Intrinsic
: Intrinsic<[llvm_anyvector_ty],		: Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMSubdivide4VectorType<0>, LLVMSubdivide4VectorType<0>],		[LLVMMatchType<0>, LLVMSubdivide4VectorType<0>, LLVMSubdivide4VectorType<0>],
[IntrNoMem]>;		[IntrNoMem]>;

		class SVE_4Vec_BF16
		sdesmalenUnsubmitted Done Reply Inline Actions nit: `SVE_bfloat` is not very descriptive, maybe use `SVE_4Vec_BF16` and `SVE_4Vec_BF16_Indexed`? sdesmalen: nit: `SVE_bfloat` is not very descriptive, maybe use `SVE_4Vec_BF16` and…
		: Intrinsic<[llvm_nxv4f32_ty],
		[llvm_nxv4f32_ty, llvm_nxv8bf16_ty, llvm_nxv8bf16_ty],
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: keep this on one line. sdesmalen: nit: keep this on one line.
		[IntrNoMem]>;

		class SVE_4Vec_BF16_Indexed
		: Intrinsic<[llvm_nxv4f32_ty],
		[llvm_nxv4f32_ty, llvm_nxv8bf16_ty, llvm_nxv8bf16_ty, llvm_i64_ty],
		[IntrNoMem, ImmArg<ArgIndex<3>>]>;

//		//
// Vector tuple creation intrinsics (ACLE)		// Vector tuple creation intrinsics (ACLE)
//		//

def int_aarch64_sve_tuple_create2 : AdvSIMD_SVE_Create_2Vector_Tuple;		def int_aarch64_sve_tuple_create2 : AdvSIMD_SVE_Create_2Vector_Tuple;
def int_aarch64_sve_tuple_create3 : AdvSIMD_SVE_Create_3Vector_Tuple;		def int_aarch64_sve_tuple_create3 : AdvSIMD_SVE_Create_3Vector_Tuple;
def int_aarch64_sve_tuple_create4 : AdvSIMD_SVE_Create_4Vector_Tuple;		def int_aarch64_sve_tuple_create4 : AdvSIMD_SVE_Create_4Vector_Tuple;

▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines
def int_aarch64_sve_fcmpne : AdvSIMD_SVE_Compare_Intrinsic;		def int_aarch64_sve_fcmpne : AdvSIMD_SVE_Compare_Intrinsic;
def int_aarch64_sve_fcmpuo : AdvSIMD_SVE_Compare_Intrinsic;		def int_aarch64_sve_fcmpuo : AdvSIMD_SVE_Compare_Intrinsic;

def int_aarch64_sve_fcvtzs_i32f16 : Builtin_SVCVT<"svcvt_s32_f16_m", llvm_nxv4i32_ty, llvm_nxv4i1_ty, llvm_nxv8f16_ty>;		def int_aarch64_sve_fcvtzs_i32f16 : Builtin_SVCVT<"svcvt_s32_f16_m", llvm_nxv4i32_ty, llvm_nxv4i1_ty, llvm_nxv8f16_ty>;
def int_aarch64_sve_fcvtzs_i32f64 : Builtin_SVCVT<"svcvt_s32_f64_m", llvm_nxv4i32_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;		def int_aarch64_sve_fcvtzs_i32f64 : Builtin_SVCVT<"svcvt_s32_f64_m", llvm_nxv4i32_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;
def int_aarch64_sve_fcvtzs_i64f16 : Builtin_SVCVT<"svcvt_s64_f16_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv8f16_ty>;		def int_aarch64_sve_fcvtzs_i64f16 : Builtin_SVCVT<"svcvt_s64_f16_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv8f16_ty>;
def int_aarch64_sve_fcvtzs_i64f32 : Builtin_SVCVT<"svcvt_s64_f32_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv4f32_ty>;		def int_aarch64_sve_fcvtzs_i64f32 : Builtin_SVCVT<"svcvt_s64_f32_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv4f32_ty>;

		def int_aarch64_sve_fcvt_bf16f32 : Builtin_SVCVT<"svcvt_bf16_f32_m", llvm_nxv8bf16_ty, llvm_nxv8i1_ty, llvm_nxv4f32_ty>;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: use `fcvtbf` instead of `cvt` => `int_aarch64_sve_fcvtbf_bf16f32` ? sdesmalen: nit: use `fcvtbf` instead of `cvt` => `int_aarch64_sve_fcvtbf_bf16f32` ?
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions Renamed to `int_aarch64_sve_fcvt_bf16f32` and `int_aarch64_sve_fcvtnt_bf16f32` respectively, because I think it wouldn't make sense to add the `bf` suffix to the `cvtnt` version of the intrinsic. fpetrogalli: Renamed to `int_aarch64_sve_fcvt_bf16f32` and `int_aarch64_sve_fcvtnt_bf16f32` respectively…
		sdesmalenUnsubmitted Not Done Reply Inline Actions I meant to write `int_aarch64_sve_bfcvt_bf16f32`. This seems consistent with all other intrinsics (`fcvt`, `fcvtzu`, `scvtf`, etc.) that use the name of the instruction directly in the name of the intrinsic. sdesmalen: I meant to write `int_aarch64_sve_bfcvt_bf16f32`. This seems consistent with all other…
		def int_aarch64_sve_fcvtnt_bf16f32 : Builtin_SVCVT<"svcvtnt_bf16_f32_m", llvm_nxv8bf16_ty, llvm_nxv8i1_ty, llvm_nxv4f32_ty>;

def int_aarch64_sve_fcvtzu_i32f16 : Builtin_SVCVT<"svcvt_u32_f16_m", llvm_nxv4i32_ty, llvm_nxv4i1_ty, llvm_nxv8f16_ty>;		def int_aarch64_sve_fcvtzu_i32f16 : Builtin_SVCVT<"svcvt_u32_f16_m", llvm_nxv4i32_ty, llvm_nxv4i1_ty, llvm_nxv8f16_ty>;
def int_aarch64_sve_fcvtzu_i32f64 : Builtin_SVCVT<"svcvt_u32_f64_m", llvm_nxv4i32_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;		def int_aarch64_sve_fcvtzu_i32f64 : Builtin_SVCVT<"svcvt_u32_f64_m", llvm_nxv4i32_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;
def int_aarch64_sve_fcvtzu_i64f16 : Builtin_SVCVT<"svcvt_u64_f16_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv8f16_ty>;		def int_aarch64_sve_fcvtzu_i64f16 : Builtin_SVCVT<"svcvt_u64_f16_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv8f16_ty>;
def int_aarch64_sve_fcvtzu_i64f32 : Builtin_SVCVT<"svcvt_u64_f32_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv4f32_ty>;		def int_aarch64_sve_fcvtzu_i64f32 : Builtin_SVCVT<"svcvt_u64_f32_m", llvm_nxv2i64_ty, llvm_nxv2i1_ty, llvm_nxv4f32_ty>;

def int_aarch64_sve_fcvt_f16f32 : Builtin_SVCVT<"svcvt_f16_f32_m", llvm_nxv8f16_ty, llvm_nxv4i1_ty, llvm_nxv4f32_ty>;		def int_aarch64_sve_fcvt_f16f32 : Builtin_SVCVT<"svcvt_f16_f32_m", llvm_nxv8f16_ty, llvm_nxv4i1_ty, llvm_nxv4f32_ty>;
def int_aarch64_sve_fcvt_f16f64 : Builtin_SVCVT<"svcvt_f16_f64_m", llvm_nxv8f16_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;		def int_aarch64_sve_fcvt_f16f64 : Builtin_SVCVT<"svcvt_f16_f64_m", llvm_nxv8f16_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;
def int_aarch64_sve_fcvt_f32f64 : Builtin_SVCVT<"svcvt_f32_f64_m", llvm_nxv4f32_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;		def int_aarch64_sve_fcvt_f32f64 : Builtin_SVCVT<"svcvt_f32_f64_m", llvm_nxv4f32_ty, llvm_nxv2i1_ty, llvm_nxv2f64_ty>;
▲ Show 20 Lines • Show All 534 Lines • ▼ Show 20 Lines
def int_aarch64_sve_usdot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;		def int_aarch64_sve_usdot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;
def int_aarch64_sve_sudot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;		def int_aarch64_sve_sudot_lane : AdvSIMD_SVE_DOT_Indexed_Intrinsic;

//		//
// SVE ACLE: 7.4/5. FP64/FP32 matrix multiply extensions		// SVE ACLE: 7.4/5. FP64/FP32 matrix multiply extensions
//		//
def int_aarch64_sve_fmmla : AdvSIMD_3VectorArg_Intrinsic;		def int_aarch64_sve_fmmla : AdvSIMD_3VectorArg_Intrinsic;

		//
		// SVE ACLE: 7.2. BFloat16 extensions
		//

		def int_aarch64_sve_bfdot : SVE_4Vec_BF16;
		def int_aarch64_sve_bfmlalb : SVE_4Vec_BF16;
		def int_aarch64_sve_bfmlalt : SVE_4Vec_BF16;

		def int_aarch64_sve_bfmmla : SVE_4Vec_BF16;

		def int_aarch64_sve_bfdot_lane : SVE_4Vec_BF16_Indexed;
		def int_aarch64_sve_bfmlalb_lane : SVE_4Vec_BF16_Indexed;
		def int_aarch64_sve_bfmlalt_lane : SVE_4Vec_BF16_Indexed;
}		}

//		//
// SVE2 - Contiguous conflict detection		// SVE2 - Contiguous conflict detection
//		//

def int_aarch64_sve_whilerw_b : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilerw_b : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilerw_h : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilerw_h : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilerw_s : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilerw_s : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilerw_d : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilerw_d : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilewr_b : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilewr_b : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilewr_h : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilewr_h : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilewr_s : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilewr_s : SVE2_CONFLICT_DETECT_Intrinsic;
def int_aarch64_sve_whilewr_d : SVE2_CONFLICT_DETECT_Intrinsic;		def int_aarch64_sve_whilewr_d : SVE2_CONFLICT_DETECT_Intrinsic;

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,233 Lines • ▼ Show 20 Lines	multiclass sve_prefetch<SDPatternOperator prefetch, ValueType PredTy, Instruction RegImmInst, Instruction RegRegInst, int scale, ComplexPattern AddrCP> {
defm FRINTZ_ZPmZ : sve_fp_2op_p_zd_HSD<0b00011, "frintz", int_aarch64_sve_frintz>;		defm FRINTZ_ZPmZ : sve_fp_2op_p_zd_HSD<0b00011, "frintz", int_aarch64_sve_frintz>;
defm FRINTA_ZPmZ : sve_fp_2op_p_zd_HSD<0b00100, "frinta", int_aarch64_sve_frinta>;		defm FRINTA_ZPmZ : sve_fp_2op_p_zd_HSD<0b00100, "frinta", int_aarch64_sve_frinta>;
defm FRINTX_ZPmZ : sve_fp_2op_p_zd_HSD<0b00110, "frintx", int_aarch64_sve_frintx>;		defm FRINTX_ZPmZ : sve_fp_2op_p_zd_HSD<0b00110, "frintx", int_aarch64_sve_frintx>;
defm FRINTI_ZPmZ : sve_fp_2op_p_zd_HSD<0b00111, "frinti", int_aarch64_sve_frinti>;		defm FRINTI_ZPmZ : sve_fp_2op_p_zd_HSD<0b00111, "frinti", int_aarch64_sve_frinti>;
defm FRECPX_ZPmZ : sve_fp_2op_p_zd_HSD<0b01100, "frecpx", int_aarch64_sve_frecpx>;		defm FRECPX_ZPmZ : sve_fp_2op_p_zd_HSD<0b01100, "frecpx", int_aarch64_sve_frecpx>;
defm FSQRT_ZPmZ : sve_fp_2op_p_zd_HSD<0b01101, "fsqrt", int_aarch64_sve_fsqrt>;		defm FSQRT_ZPmZ : sve_fp_2op_p_zd_HSD<0b01101, "fsqrt", int_aarch64_sve_fsqrt>;

let Predicates = [HasBF16, HasSVE] in {		let Predicates = [HasBF16, HasSVE] in {
def BFDOT_ZZZ : sve_bfloat_dot<"bfdot">;		defm BFDOT_ZZZ : sve_bfloat_dot<"bfdot", int_aarch64_sve_bfdot>;
def BFDOT_ZZI : sve_bfloat_dot_indexed<"bfdot">;		defm BFDOT_ZZI : sve_bfloat_dot_indexed<"bfdot", int_aarch64_sve_bfdot_lane>;
def BFMMLA_ZZZ : sve_bfloat_matmul<"bfmmla">;		defm BFMMLA_ZZZ : sve_bfloat_matmul<"bfmmla", int_aarch64_sve_bfmmla>;
def BFMMLA_B_ZZZ : sve_bfloat_matmul_longvecl<0b0, "bfmlalb">;		defm BFMMLA_B_ZZZ : sve_bfloat_matmul_longvecl<0b0, "bfmlalb", int_aarch64_sve_bfmlalb>;
def BFMMLA_T_ZZZ : sve_bfloat_matmul_longvecl<0b1, "bfmlalt">;		defm BFMMLA_T_ZZZ : sve_bfloat_matmul_longvecl<0b1, "bfmlalt", int_aarch64_sve_bfmlalt>;
def BFMMLA_B_ZZI : sve_bfloat_matmul_longvecl_idx<0b0, "bfmlalb">;		defm BFMMLA_B_ZZI : sve_bfloat_matmul_longvecl_idx<0b0, "bfmlalb", int_aarch64_sve_bfmlalb_lane>;
def BFMMLA_T_ZZI : sve_bfloat_matmul_longvecl_idx<0b1, "bfmlalt">;		defm BFMMLA_T_ZZI : sve_bfloat_matmul_longvecl_idx<0b1, "bfmlalt", int_aarch64_sve_bfmlalt_lane>;
def BFCVT_ZPmZ : sve_bfloat_convert<0b1, "bfcvt">;		defm BFCVT_ZPmZ : sve_bfloat_convert<0b1, "bfcvt", int_aarch64_sve_fcvt_bf16f32>;
def BFCVTNT_ZPmZ : sve_bfloat_convert<0b0, "bfcvtnt">;		defm BFCVTNT_ZPmZ : sve_bfloat_convert<0b0, "bfcvtnt", int_aarch64_sve_fcvtnt_bf16f32>;
}		}

// InstAliases		// InstAliases
def : InstAlias<"mov $Zd, $Zn",		def : InstAlias<"mov $Zd, $Zn",
(ORR_ZZZ ZPR64:$Zd, ZPR64:$Zn, ZPR64:$Zn), 1>;		(ORR_ZZZ ZPR64:$Zd, ZPR64:$Zn, ZPR64:$Zn), 1>;
def : InstAlias<"mov $Pd, $Pg/m, $Pn",		def : InstAlias<"mov $Pd, $Pg/m, $Pn",
(SEL_PPPP PPR8:$Pd, PPRAny:$Pg, PPR8:$Pn, PPR8:$Pd), 1>;		(SEL_PPPP PPR8:$Pd, PPRAny:$Pg, PPR8:$Pn, PPR8:$Pd), 1>;
def : InstAlias<"mov $Pd, $Pn",		def : InstAlias<"mov $Pd, $Pn",
▲ Show 20 Lines • Show All 1,137 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/SVEInstrFormats.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,454 Lines • ▼ Show 20 Lines

	class sve_bfloat_dot<string asm>			class sve_bfloat_dot<string asm>
	: sve_bfloat_dot_base<0b10, asm, "\t$Zda, $Zn, $Zm",			: sve_bfloat_dot_base<0b10, asm, "\t$Zda, $Zn, $Zm",
	(ins ZPR32:$_Zda, ZPR16:$Zn, ZPR16:$Zm)> {			(ins ZPR32:$_Zda, ZPR16:$Zn, ZPR16:$Zm)> {
	bits<5> Zm;			bits<5> Zm;
	let Inst{20-16} = Zm;			let Inst{20-16} = Zm;
	}			}

				multiclass sve_bfloat_dot<string asm, SDPatternOperator op> {
				def NAME : sve_bfloat_dot<asm>;
				def : SVE_3_Op_Pat<nxv4f32, op, nxv4f32, nxv8bf16, nxv8bf16 ,!cast<Instruction>(NAME)>;
				}

	class sve_bfloat_dot_indexed<string asm>			class sve_bfloat_dot_indexed<string asm>
	: sve_bfloat_dot_base<0b01, asm, "\t$Zda, $Zn, $Zm$iop",			: sve_bfloat_dot_base<0b01, asm, "\t$Zda, $Zn, $Zm$iop",
	(ins ZPR32:$_Zda, ZPR16:$Zn, ZPR3b16:$Zm, VectorIndexS:$iop)> {			(ins ZPR32:$_Zda, ZPR16:$Zn, ZPR3b16:$Zm, VectorIndexS:$iop)> {
	bits<2> iop;			bits<2> iop;
	bits<3> Zm;			bits<3> Zm;
	let Inst{20-19} = iop;			let Inst{20-19} = iop;
	let Inst{18-16} = Zm;			let Inst{18-16} = Zm;
	}			}

				multiclass sve_bfloat_dot_indexed<string asm, SDPatternOperator op> {
				def NAME : sve_bfloat_dot_indexed<asm>;
				def : SVE_4_Op_Imm_Pat<nxv4f32, op, nxv4f32, nxv8bf16, nxv8bf16, i64, VectorIndexS_timm, !cast<Instruction>(NAME)>;
				}

	class sve_bfloat_matmul<string asm>			class sve_bfloat_matmul<string asm>
	: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR16:$Zn, ZPR16:$Zm),			: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR16:$Zn, ZPR16:$Zm),
	asm, "\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {			asm, "\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {
	bits<5> Zm;			bits<5> Zm;
	bits<5> Zda;			bits<5> Zda;
	bits<5> Zn;			bits<5> Zn;
	let Inst{31-21} = 0b01100100011;			let Inst{31-21} = 0b01100100011;
	let Inst{20-16} = Zm;			let Inst{20-16} = Zm;
	let Inst{15-10} = 0b111001;			let Inst{15-10} = 0b111001;
	let Inst{9-5} = Zn;			let Inst{9-5} = Zn;
	let Inst{4-0} = Zda;			let Inst{4-0} = Zda;

	let Constraints = "$Zda = $_Zda";			let Constraints = "$Zda = $_Zda";
	let DestructiveInstType = DestructiveOther;			let DestructiveInstType = DestructiveOther;
	let ElementSize = ElementSizeH;			let ElementSize = ElementSizeH;
	}			}

				multiclass sve_bfloat_matmul<string asm, SDPatternOperator op> {
				def NAME : sve_bfloat_matmul<asm>;
				def : SVE_3_Op_Pat<nxv4f32, op, nxv4f32, nxv8bf16, nxv8bf16 ,!cast<Instruction>(NAME)>;
				}

	class sve_bfloat_matmul_longvecl<bit BT, string asm>			class sve_bfloat_matmul_longvecl<bit BT, string asm>
	: sve_bfloat_matmul<asm> {			: sve_bfloat_matmul<asm> {
	let Inst{23} = 0b1;			let Inst{23} = 0b1;
	let Inst{14-13} = 0b00;			let Inst{14-13} = 0b00;
	let Inst{10} = BT;			let Inst{10} = BT;
	}			}

				multiclass sve_bfloat_matmul_longvecl<bit BT, string asm, SDPatternOperator op> {
				def NAME : sve_bfloat_matmul_longvecl<BT, asm>;
				def : SVE_3_Op_Pat<nxv4f32, op, nxv4f32, nxv8bf16, nxv8bf16 ,!cast<Instruction>(NAME)>;
				}

	class sve_bfloat_matmul_longvecl_idx<bit BT, string asm>			class sve_bfloat_matmul_longvecl_idx<bit BT, string asm>
	: sve_bfloat_dot_base<0b01, asm, "\t$Zda, $Zn, $Zm$iop",			: sve_bfloat_dot_base<0b01, asm, "\t$Zda, $Zn, $Zm$iop",
	(ins ZPR32:$_Zda, ZPR16:$Zn, ZPR3b16:$Zm, VectorIndexH:$iop)> {			(ins ZPR32:$_Zda, ZPR16:$Zn, ZPR3b16:$Zm, VectorIndexH:$iop)> {
	bits<3> iop;			bits<3> iop;
	bits<3> Zm;			bits<3> Zm;
	let Inst{23} = 0b1;			let Inst{23} = 0b1;
	let Inst{20-19} = iop{2-1};			let Inst{20-19} = iop{2-1};
	let Inst{18-16} = Zm;			let Inst{18-16} = Zm;
	let Inst{11} = iop{0};			let Inst{11} = iop{0};
	let Inst{10} = BT;			let Inst{10} = BT;
	}			}

				multiclass sve_bfloat_matmul_longvecl_idx<bit BT, string asm, SDPatternOperator op> {
				def NAME : sve_bfloat_matmul_longvecl_idx<BT, asm>;
				def : SVE_4_Op_Imm_Pat<nxv4f32, op, nxv4f32, nxv8bf16, nxv8bf16, i64, VectorIndexH_timm, !cast<Instruction>(NAME)>;
				}

	class sve_bfloat_convert<bit N, string asm>			class sve_bfloat_convert<bit N, string asm>
	: I<(outs ZPR16:$Zd), (ins ZPR16:$_Zd, PPR3bAny:$Pg, ZPR32:$Zn),			: I<(outs ZPR16:$Zd), (ins ZPR16:$_Zd, PPR3bAny:$Pg, ZPR32:$Zn),
	asm, "\t$Zd, $Pg/m, $Zn", "", []>, Sched<[]> {			asm, "\t$Zd, $Pg/m, $Zn", "", []>, Sched<[]> {
	bits<5> Zd;			bits<5> Zd;
	bits<3> Pg;			bits<3> Pg;
	bits<5> Zn;			bits<5> Zn;
	let Inst{31-25} = 0b0110010;			let Inst{31-25} = 0b0110010;
	let Inst{24} = N;			let Inst{24} = N;
	let Inst{23-13} = 0b10001010101;			let Inst{23-13} = 0b10001010101;
	let Inst{12-10} = Pg;			let Inst{12-10} = Pg;
	let Inst{9-5} = Zn;			let Inst{9-5} = Zn;
	let Inst{4-0} = Zd;			let Inst{4-0} = Zd;

	let Constraints = "$Zd = $_Zd";			let Constraints = "$Zd = $_Zd";
	let DestructiveInstType = DestructiveOther;			let DestructiveInstType = DestructiveOther;
	let hasSideEffects = 1;			let hasSideEffects = 1;
	let ElementSize = ElementSizeS;			let ElementSize = ElementSizeS;
	}			}

				multiclass sve_bfloat_convert<bit N, string asm, SDPatternOperator op> {
				def NAME : sve_bfloat_convert<N, asm>;
				def : SVE_3_Op_Pat<nxv8bf16, op, nxv8bf16, nxv8i1, nxv4f32, !cast<Instruction>(NAME)>;
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// SVE Integer Matrix Multiply Group			// SVE Integer Matrix Multiply Group
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class sve_int_matmul<bits<2> uns, string asm>			class sve_int_matmul<bits<2> uns, string asm>
	: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR8:$Zn, ZPR8:$Zm), asm,			: I<(outs ZPR32:$Zda), (ins ZPR32:$_Zda, ZPR8:$Zn, ZPR8:$Zm), asm,
	"\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {			"\t$Zda, $Zn, $Zm", "", []>, Sched<[]> {
	bits<5> Zda;			bits<5> Zda;
	▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve,+bf16 -asm-verbose=0 < %s \| FileCheck %s

				;
				; BFDOT
				;

				define <vscale x 4 x float> @bfdot_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfdot_f32:
				; CHECK-NEXT: bfdot z0.s, z1.h, z2.h
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfdot_lane_0_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfdot_lane_0_f32:
				; CHECK-NEXT: bfdot z0.s, z1.h, z2.h[0]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 0)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfdot_lane_1_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfdot_lane_1_f32:
				; CHECK-NEXT: bfdot z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 1)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfdot_lane_2_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfdot_lane_2_f32:
				; CHECK-NEXT: bfdot z0.s, z1.h, z2.h[2]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 2)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfdot_lane_3_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfdot_lane_3_f32:
				; CHECK-NEXT: bfdot z0.s, z1.h, z2.h[3]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 3)
				ret <vscale x 4 x float> %out
				}

				;
				; BFMLALB
				;

				define <vscale x 4 x float> @bfmlalb_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_0_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_0_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[0]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 0)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_1_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_1_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 1)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_2_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_2_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[2]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 2)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_3_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_3_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[3]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 3)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_4_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_4_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[4]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 4)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_5_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_5_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[5]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 5)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_6_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_6_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[6]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 6)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalb_lane_7_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalb_lane_7_f32:
				; CHECK-NEXT: bfmlalb z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 7)
				ret <vscale x 4 x float> %out
				}

				;
				; BFMLALT
				;

				define <vscale x 4 x float> @bfmlalt_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_0_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_0_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[0]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 0)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_1_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_1_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[1]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 1)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_2_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_2_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[2]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 2)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_3_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_3_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[3]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 3)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_4_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_4_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[4]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 4)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_5_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_5_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[5]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 5)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_6_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_6_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[6]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 6)
				ret <vscale x 4 x float> %out
				}

				define <vscale x 4 x float> @bfmlalt_lane_7_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmlalt_lane_7_f32:
				; CHECK-NEXT: bfmlalt z0.s, z1.h, z2.h[7]
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c, i64 7)
				ret <vscale x 4 x float> %out
				}

				;
				; BFMMLA
				;

				define <vscale x 4 x float> @bfmmla_f32(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c) nounwind {
				; CHECK-LABEL: bfmmla_f32:
				; CHECK-NEXT: bfmmla z0.s, z1.h, z2.h
				; CHECK-NEXT: ret
				%out = call <vscale x 4 x float> @llvm.aarch64.sve.bfmmla(<vscale x 4 x float> %a, <vscale x 8 x bfloat> %b, <vscale x 8 x bfloat> %c)
				ret <vscale x 4 x float> %out
				}

				;
				; BFCVT
				;

				define <vscale x 8 x bfloat> @fcvt_bf16_f32(<vscale x 8 x bfloat> %a, <vscale x 8 x i1> %pg, <vscale x 4 x float> %b) nounwind {
				; CHECK-LABEL: fcvt_bf16_f32:
				; CHECK-NEXT: bfcvt z0.h, p0/m, z1.s
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvt.bf16f32(<vscale x 8 x bfloat> %a, <vscale x 8 x i1> %pg, <vscale x 4 x float> %b)
				ret <vscale x 8 x bfloat> %out
				}

				;
				; BFCVTNT
				;

				define <vscale x 8 x bfloat> @fcvtnt_bf16_f32(<vscale x 8 x bfloat> %a, <vscale x 8 x i1> %pg, <vscale x 4 x float> %b) nounwind {
				; CHECK-LABEL: fcvtnt_bf16_f32:
				; CHECK-NEXT: bfcvtnt z0.h, p0/m, z1.s
				; CHECK-NEXT: ret
				%out = call <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvtnt.bf16f32(<vscale x 8 x bfloat> %a, <vscale x 8 x i1> %pg, <vscale x 4 x float> %b)
				ret <vscale x 8 x bfloat> %out
				}

				declare <vscale x 4 x float> @llvm.aarch64.sve.bfdot(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.bfdot.lane(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i64)
				declare <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.bfmlalb.lane(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i64)
				declare <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>)
				declare <vscale x 4 x float> @llvm.aarch64.sve.bfmlalt.lane(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>, i64)
				declare <vscale x 4 x float> @llvm.aarch64.sve.bfmmla(<vscale x 4 x float>, <vscale x 8 x bfloat>, <vscale x 8 x bfloat>)
				declare <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvt.bf16f32(<vscale x 8 x bfloat>, <vscale x 8 x i1>, <vscale x 4 x float>)
				declare <vscale x 8 x bfloat> @llvm.aarch64.sve.fcvtnt.bf16f32(<vscale x 8 x bfloat>, <vscale x 8 x i1>, <vscale x 4 x float>)

This is an archive of the discontinued LLVM Phabricator instance.

[sve][acle] Add SVE BFloat16 extensions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 272480

clang/include/clang/Basic/arm_sve.td

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalb.c

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmlalt.c

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfmmla.c

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cvt-bfloat.c

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_cvtnt.c

clang/utils/TableGen/SveEmitter.cpp

llvm/include/llvm/IR/IntrinsicsAArch64.td

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

llvm/lib/Target/AArch64/SVEInstrFormats.td

llvm/test/CodeGen/AArch64/sve-intrinsics-bfloat.ll

[sve][acle] Add SVE BFloat16 extensions.
ClosedPublic