Page MenuHomePhabricator

[AArch64]: BFloat Load/Store Intrinsics&CodeGen
ClosedPublic

Authored by LukeGeeson on May 28 2020, 5:56 AM.

Details

Summary

This patch upstreams support for ld / st variants of BFloat intrinsics
in from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

  • Luke Geeson
  • Momchil Velikov
  • Luke Cheeseman

Diff Detail

Event Timeline

LukeGeeson created this revision.May 28 2020, 5:56 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 28 2020, 5:56 AM
labrinea added inline comments.
clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c
37

CHECK-NEXT or CHECK-DAG are preferable for sequences.

181

where are the check lines?

stuij requested changes to this revision.May 28 2020, 6:42 AM

We need testing for the backend code.

This revision now requires changes to proceed.May 28 2020, 6:42 AM
simon_tatham added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
10366 ↗(On Diff #266829)

What effect is this change of strategy having on the alignment computation, for the already-supported instances of this builtin?

It looks to me as if __builtin_neon_vld1_v with (say) a uint8_t * pointer argument will now compute Alignment=1 (the natural alignment of the pointee type), whereas it would previously have computed Alignment=8 (the size of the whole vector being loaded or stored).

Is that intentional? Or accidental? Or have I completely misunderstood what's going on?

(Whichever of the three, some discussion in the commit message would be a good idea, explaining why this change does or does not make a difference, as appropriate.)

LukeGeeson updated this revision to Diff 267896.Jun 2 2020, 8:30 AM
LukeGeeson marked 4 inline comments as done.
LukeGeeson added a subscriber: pratlucas.
LukeGeeson added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
10366 ↗(On Diff #266829)

Clang was incorrectly assuming that all the pointers from which loads were being generated for vld1 intrinsics were aligned according to according to the intrinsics result type. This causes alignment faults on the code generated by the backend.
This fixes the issue so that alignment is based on the type of the pointer provided to as input to the intrinsic.

@pratlucas has done some work on this in parallel https://reviews.llvm.org/D79721 which has been approved and may overrule this particular line of code.

I shall add a note to the commit message, and tentatively mark this as fixed, given it's liable to adopt the work of Lucas.

clang/test/CodeGen/aarch64-bf16-ldst-intrinsics.c
37

Added to be consistent with the rest of the file (ie no CHECK-NEXT, but CHECK32/64)

181

Added to be consistent with the rest of the file

We need testing for the backend code.

@stuij I have added aarch64-bf16-ldst-intrinsics.ll to test the backend. Please let me know if this is ok :)

pratlucas added inline comments.Jun 2 2020, 9:38 AM
clang/lib/CodeGen/CGBuiltin.cpp
10366 ↗(On Diff #266829)

Hi @LukeGeeson ,
Just as a heads up, some changes to this implementations were requested on D79721.
The usage of CGM.getNaturalPointeeTypeAlignment and IgnoreParenCasts() was causing problems on certain argument types, so the alignment is now captured from the expression itself when it is emitted.

LukeGeeson updated this revision to Diff 267908.Jun 2 2020, 9:38 AM

Accidentally added dotprod tests here rather than the child commit - just removed them

LukeGeeson marked an inline comment as done.Jun 2 2020, 9:45 AM
LukeGeeson added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
10366 ↗(On Diff #266829)

Thanks @pratlucas yeah I plan to rebase my code onto upstream when you have made those changes (and fix whatever still breaks) before I push :)

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

stuij added a comment.Jun 4 2020, 3:22 PM

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

Hi Luke,

For the backend tests it would be good if you would use CHECK-NEXT from label to ret, like I believe you did in the other patch, using -asm-verbose=0 to get rid of the cruft.

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

Hi Luke,

For the backend tests it would be good if you would use CHECK-NEXT from label to ret, like I believe you did in the other patch, using -asm-verbose=0 to get rid of the cruft.

Similar to my other comment in the [[ https://reviews.llvm.org/D80752 | other ]]patch:

This isn't how to get rid of kill statements. In particular if you pass -asm-verbose=0 to llc in the RUN statement then no CHECKs are generated, let alone kill statements.

Instead to get this desired result you run llc without that argument, and then manually remove these unnecessary kill lines. This is what I have done and this should fix this. Patch incoming

Besides from rebasing to get @pratlucas changes upstream.

@stuij please could you confirm if you are happy with this, so I can merge

Hi Luke,

For the backend tests it would be good if you would use CHECK-NEXT from label to ret, like I believe you did in the other patch, using -asm-verbose=0 to get rid of the cruft.

Similar to my other comment in the [[ https://reviews.llvm.org/D80752 | other ]]patch:

This isn't how to get rid of kill statements. In particular if you pass -asm-verbose=0 to llc in the RUN statement then no CHECKs are generated, let alone kill statements.

Instead to get this desired result you run llc without that argument, and then manually remove these unnecessary kill lines. This is what I have done and this should fix this. Patch incoming

Further, you cannot use CHECK-NEXT if your test function contains such a kill statement, unless you manually remove it, and use CHECK in place (it fails when running FileCheck as it sees such lines in the output and hence check-next fails if it doesn't expect it). This is just something we must balance if we want clear tests and direct 1-1 correspondence with the result. I've used CHECK-NEXT where I can, but CHECK where I must

LukeGeeson updated this revision to Diff 269569.Jun 9 2020, 9:22 AM
  • rebased my patch off of upstream llvm master
  • removed my code in favour of more recent code by @pratlucas
  • used update CC to generate tests without kill statements
  • used CHECK-NEXT where possible, unless kill statements have been removed, in this case CHECK is used
LukeGeeson updated this revision to Diff 269574.Jun 9 2020, 9:42 AM

updated check32 -> check-next32, same for check64

fixed check typo

arsenm added a subscriber: arsenm.Jun 10 2020, 6:57 AM
arsenm added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

Why is the IR type name bfloat and not bfloat16?

LukeGeeson marked 2 inline comments as done.Jun 10 2020, 7:08 AM
LukeGeeson added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

The naming for the IR type was agreed upon here after quite a big discussion.
https://reviews.llvm.org/D78190

SjoerdMeijer added inline comments.Jun 10 2020, 8:09 AM
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

I regret very much that I didn't notice this earlier... I.e., I noticed this in D76077 and wrote that I am relatively unhappy about this (I think I mentioned this on another ticket too).
Because like @arsenm , I would expect the IR type name to be bfloat16.

Correct me if I am wrong, but I don't see a big discussion about this in D78190. I only see 1 or 2 comments about BFloat vs Bfloat.

LukeGeeson marked 2 inline comments as done.Jun 10 2020, 8:33 AM
LukeGeeson added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

I cannot see a discussion about the IR type name per-se but I can see you were both involved in the discussion more generally.

I am concerned that this patch is the wrong place to discuss such issues, and that we should bring this up in a more appropriate place as you mention so that this patch isn't held back.

chill added a subscriber: chill.Jun 10 2020, 8:41 AM
chill added inline comments.
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

I don't see a compelling reason for the name to be bfloat16 or bfloat3, etc. Like other floating-point types (float, double, and half), the name denotes a specific externally defined format, unlike iN.

SjoerdMeijer added inline comments.Jun 10 2020, 9:05 AM
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

Like other floating-point types (float, double, and half), the name denotes a specific externally defined format,

Is the defined format not called bfloat16?

chill added inline comments.Jun 10 2020, 9:25 AM
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

Indeed, people use the name "bfloat16". But then the half, float, and double also differ from the official binary16, binarty32, and binary64.
IMHO bfloat fits better in the LLVM IR naming convention.

SjoerdMeijer added inline comments.Jun 10 2020, 10:53 AM
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

yeah, so that's exactly why I don't follow your logic. If there's any logic in the names here, the mapping from source-language type to IR type seems the most plausible one. And I just don't see the benefit of dropping the 16, and how that would fit better in some naming scheme or how that makes things clearer here.

chill added inline comments.Jun 10 2020, 11:08 AM
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

What source language?

That said, I'm resigning from the bikeshedding here.

chill removed a subscriber: chill.Jun 10 2020, 11:09 AM
chill removed a subscriber: momchil.velikov.
stuij added inline comments.Jun 11 2020, 4:29 AM
llvm/test/CodeGen/AArch64/aarch64-bf16-ldst-intrinsics.ll
265

Just as a house-keeping note: If we would change the naming, I think we can all agree that this ticket itself shouldn't be the place where we want to do this. I'm happy for the conversation to carry on here, but I think we can move the ticket forward at the same time.

917

You should be able to do without all these big blocks of attributes which I guess were generated from C -> IR conversion. Just remove it and the #xs after the function declarations (maybe replace them with nounwind).

  • removed unnecessary contents of test
stuij accepted this revision.Jun 12 2020, 7:15 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Jun 12 2020, 7:15 AM
This revision was automatically updated to reflect the committed changes.