This is an archive of the discontinued LLVM Phabricator instance.

[sve][acle] Implement some of the C intrinsics for brain float.
ClosedPublic

Authored by fpetrogalli on Jun 22 2020, 8:28 PM.

Details

Summary

The following intrinsics have been extended to support brain float types:

svbfloat16_t svclasta[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclasta[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlasta[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svclastb[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclastb[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlastb[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svdup[_n]_bf16(bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_m(svbfloat16_t inactive, svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_x(svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_z(svbool_t pg, bfloat16_t op)

svbfloat16_t svdupq[_n]_bf16(bfloat16_t x0, bfloat16_t x1, bfloat16_t x2, bfloat16_t x3, bfloat16_t x4, bfloat16_t x5, bfloat16_t x6, bfloat16_t x7)
svbfloat16_t svdupq_lane[_bf16](svbfloat16_t data, uint64_t index)

svbfloat16_t svinsr[_n_bf16](svbfloat16_t op1, bfloat16_t op2)

Diff Detail

Event Timeline

fpetrogalli created this revision.Jun 22 2020, 8:28 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Updates:

  1. extracted bfloat C tests into separate files (`*-bfloat.c).
  2. Added missing tests (clast[a|b], last[a|b])
  3. Tested warning is raised for missing declaration when macro __ARM_FEATURE_SVE_BF16 is not present.
  4. Cosmetic changes to formatting.

We need to guard the LLVM patterns on the +bf16 feature as we've done in other patches

clang/include/clang/Basic/arm_sve.td
681

__ARM_FEATURE_SVE_BF16 will imply __ARM_FEATURE_BF16_SCALAR_ARITHMETIC so guarding only on the former should be sufficient. Same applies below

817

nit: remove double spaces

clang/lib/CodeGen/CGBuiltin.cpp
7724–7725

already added in D82399, you should see it when rebasing

llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll
317–318

nit: alignment

665

nit: alignment

llvm/test/CodeGen/AArch64/sve-intrinsics-scalar-to-vec.ll
65–66

nit: alignment

fpetrogalli marked 5 inline comments as done.

This patch needed some love...

@c-rhodes, I have addressed your feedback, thank you.

I have also predicated all the instruction selection pattern for -mattr=+bf16, and I have updated the test to use the per-function attribute instead of adding the extra -mattr=+bf16 option at command line.

fpetrogalli marked an inline comment as done.Jun 25 2020, 1:29 PM

@fpetrogalli thanks for updating! I have a few more comments, sorry I missed a few things yesterday

clang/include/clang/Basic/arm_sve.td
688–690

nit: could you fix the spacing? I don't think it's worth trying to keep the two defs inline, single spaces everywhere would do

1149

__ARM_FEATURE_BF16_SCALAR_ARITHMETIC can be removed

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
419–420

I think we're missing a test for this pattern in llvm/test/CodeGen/AArch64/sve-vector-splat.ll? Same applies to dup 0 patterns below.

420–427

formatting changes can be reverted

1464–1466

missing tests in llvm/test/CodeGen/AArch64/sve-bitcast.ll

fpetrogalli marked 7 inline comments as done.

Hi @c-rhodes,

I have addressed all your comments but one, the one that asks to add the test cases for the splat, as it deserves a separate patch.

I will ping you when it is ready.

Francesco

fpetrogalli added inline comments.Jun 26 2020, 12:22 PM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
419–420

I have added these patters to allow adding the regression tests in this patch, so they are somehow guarded by the tests. I tried to add the test cases anyway in sve-vector-splat.ll, but the following one crashes the compiler, so the whole "splatting a bfloat constant" deserve a separate patch.

define <vscale x 8 x bfloat> @splat_nxv8bf16_imm() #0 {
; CHECK-LABEL: splat_nxv8bf16_imm:
; CHECK: mov z0.h, #1.0
; CHECK-NEXT: ret
  %1 = insertelement <vscale x 8 x bfloat> undef, bfloat 1.0, i32 0
  %2 = shufflevector <vscale x 8 x bfloat> %1, <vscale x 8 x bfloat> undef, <vscale x 8 x i32> zeroinitializer
  ret <vscale x 8 x bfloat> %2
}

I will create a new revision and make it a parent of this one.

1464–1466

The bitconvert patterns went in via D82501. This code is not present anymore in this patch.

fpetrogalli marked an inline comment as done.Jun 26 2020, 12:40 PM
fpetrogalli added inline comments.
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
419–420

(facepalm) There is no "dup" instruction for bfloat immediates... that's why this is not working. I guess a separate patch is not needed, this one is enough...

@fpetrogalli thanks for updating, LGTM!

c-rhodes accepted this revision.Jun 29 2020, 3:13 AM
This revision is now accepted and ready to land.Jun 29 2020, 3:13 AM

Rebase on top of master. NFC

This revision was automatically updated to reflect the committed changes.