This is an archive of the discontinued LLVM Phabricator instance.

[AArch64]: BFloat MatMul Intrinsics&CodeGen
ClosedPublic

Authored by LukeGeeson on May 28 2020, 12:17 PM.

Details

Summary

This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

  • Luke Geeson
  • Momchil Velikov
  • Mikhail Maltsev
  • Luke Cheeseman

Diff Detail

Event Timeline

LukeGeeson created this revision.May 28 2020, 12:17 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 28 2020, 12:17 PM
miyuki added inline comments.Jun 2 2020, 6:14 AM
clang/test/CodeGen/aarch64-bf16-dotprod-intrinsics.c
9

Why not CHECK-NEXT?

llvm/test/CodeGen/AArch64/aarch64-bf16-dotprod-intrinsics.ll
5

Would it make sense to check the whole body of the compiled function?

LukeGeeson updated this revision to Diff 267909.Jun 2 2020, 9:41 AM

Added CHECK-NEXT lines, tested whole functions

LukeGeeson marked an inline comment as done.Jun 2 2020, 9:45 AM
LukeGeeson added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-dotprod-intrinsics.ll
5

Oops sorry, having all kinds of issues with my commit history here, give me a moment to address this

LukeGeeson marked an inline comment as done.Jun 2 2020, 10:16 AM
LukeGeeson added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-dotprod-intrinsics.ll
5

I would say it's not worth testing the whole function here, the only code emitted for each is the instruction mentioned in the CHECK and a ret surrounded by lot's of compiler labels and directives that we don't need to test here

miyuki added inline comments.Jun 2 2020, 10:52 AM
clang/lib/CodeGen/CGBuiltin.cpp
16164

This chunk does not belong to the patch

llvm/test/CodeGen/AArch64/aarch64-bf16-dotprod-intrinsics.ll
5

I meant, just the code from the first BB label to ret (inclusive), without directives. I suggest using llvm/utils/update_llc_test_checks.py to generate the checks.

miyuki added inline comments.Jun 2 2020, 2:22 PM
clang/lib/CodeGen/CGBuiltin.cpp
16164

Oops, ignore the previous comment, please.

LukeGeeson updated this revision to Diff 268165.Jun 3 2020, 6:15 AM

ran llvm/utils/update_llc_test_checks.py on test to get proper CHECKs

LukeGeeson marked an inline comment as done.Jun 3 2020, 6:16 AM
LukeGeeson added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-dotprod-intrinsics.ll
5

Hopefully this is everything now, please let me know if there is anything else :)

miyuki added inline comments.Jun 3 2020, 8:53 AM
clang/test/CodeGen/aarch64-bf16-dotprod-intrinsics.c
3

Is it possible to avoid running the whole -O2 pipeline and instead run, say,

%clang_cc1 -triple aarch64-arm-none-eabi -target-feature +neon -target-feature +bf16 \
-disable-O0-optnone -emit-llvm %s -o - | opt -S -mem2reg -instcombine | FileCheck %s

Also, I suggest auto-generating the checks using llvm/utils/update_cc_test_checks.py. Sorry, I should have mentioned it in the previous review iteration.

11

CHECK-NEXT:

  • used update_cc_test_checks.py to generate correct checks
LukeGeeson marked 3 inline comments as done.Jun 4 2020, 10:18 AM
miyuki accepted this revision.Jun 4 2020, 10:29 AM

LGTM

This revision is now accepted and ready to land.Jun 4 2020, 10:29 AM
stuij added a comment.Jun 4 2020, 3:21 PM

For the backend tests, I suggest using -asm-verbose=0 with llc to only print instructions and get rid of // kill: .. and friends. Use update_cc_test_checks.py again to regenerate the testing.

For the backend tests, I suggest using -asm-verbose=0 with llc to only print instructions and get rid of // kill: .. and friends. Use update_cc_test_checks.py again to regenerate the testing.

This isn't how to get rid of kill statements. In particular if you pass -asm-verbose=0 to llc in the RUN statement then no CHECKs are generated, let alone kill statements.

Instead to get this desired result you run llc without that argument, and then manually remove these unnecessary kill lines. This is what I have done and this should fix this. Patch incoming

addressed review comments

stuij accepted this revision.Jun 11 2020, 6:04 AM

LGTM. Thanks!

This revision was automatically updated to reflect the committed changes.