This is an archive of the discontinued LLVM Phabricator instance.

[sve][acle] Add SVE BFloat16 extensions.
ClosedPublic

Authored by fpetrogalli on Jun 18 2020, 8:22 PM.

Details

Summary

List of intrinsics:

svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfdot[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfdot_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmmla[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)

svfloat32_t svbfmlalb[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalb[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalb_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmlalt[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalt[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalt_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svbfloat16_t svcvt_bf16[_f32]_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_x(svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_z(svbool_t pg, svfloat32_t op)

svbfloat16_t svcvtnt_bf16[_f32]_m(svbfloat16_t even, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvtnt_bf16[_f32]_x(svbfloat16_t even, svbool_t pg, svfloat32_t op)

For reference, see section 7.2 of "Arm C Language Extensions for SVE - Version 00bet4"

Diff Detail

Event Timeline

fpetrogalli created this revision.Jun 18 2020, 8:22 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
sdesmalen added inline comments.Jun 19 2020, 2:01 AM
clang/include/clang/Basic/arm_sve.td
494

The types for these intrinsics are always svfloat32_t and svbfloat16_t, which given their semantics is unlikely to ever be extended to other types, so it's easier to make the LLVM IR non-overloaded (i.e. hardcoding llvm_nxv4f32_ty and llvm_nxv8bf16_ty) and using the IsOverloadNone flag for these builtins. Then you can express this builtin as:

def SVBFDOT: SInst<"svbfdot[_{0}]",  "MMdd", "b", MergeNone, "aarch64_sve_bfdot">;

and drop the need for the $ modifier.

498

Similar to the suggestion above to use "MMdd" for SVBFDOT, this could use "MMda" and you don't need the ~ modifier.

nit: add whitespace above this line.
nit: the rest of this file tries to align the columns, that makes this file a bit easier to read.

1032

nit: redundant comment (same for above)

clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_bfdot.c
27

Testing the edge cases 0 and 3 should be sufficient. (same for all other cases in this patch)

llvm/include/llvm/IR/IntrinsicsAArch64.td
1343

nit: SVE_bfloat is not very descriptive, maybe use SVE_4Vec_BF16 and SVE_4Vec_BF16_Indexed?

1811

nit: use fcvtbf instead of cvt => int_aarch64_sve_fcvtbf_bf16f32 ?

fpetrogalli marked 7 inline comments as done.

Thank you for the review @sdesmalen!

Francesco

llvm/include/llvm/IR/IntrinsicsAArch64.td
1811

Renamed to int_aarch64_sve_fcvt_bf16f32 and int_aarch64_sve_fcvtnt_bf16f32 respectively, because I think it wouldn't make sense to add the bf suffix to the cvtnt version of the intrinsic.

sdesmalen accepted this revision.Jun 22 2020, 12:35 AM

LGTM

llvm/include/llvm/IR/IntrinsicsAArch64.td
1811

I meant to write int_aarch64_sve_bfcvt_bf16f32. This seems consistent with all other intrinsics (fcvt, fcvtzu, scvtf, etc.) that use the name of the instruction directly in the name of the intrinsic.

This revision is now accepted and ready to land.Jun 22 2020, 12:35 AM
sdesmalen added inline comments.Jun 22 2020, 12:36 AM
llvm/include/llvm/IR/IntrinsicsAArch64.td
1345

nit: keep this on one line.

Formatting changes. NFC.

This revision was automatically updated to reflect the committed changes.