This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Add __bf16 as new Bfloat16 C Type
ClosedPublic

Authored by stuij on Mar 12 2020, 9:59 AM.

Details

Summary

This patch upstreams support for a new storage only bfloat16 C type.
This type is used to implement primitive support for bfloat16 data, in
line with the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

In detail this patch:

  • introduces an opaque, storage-only C-type __bf16, which introduces a new bfloat IR type.

This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.

The following people contributed to this patch:

  • Luke Cheeseman
  • Momchil Velikov
  • Alexandros Lamprineas
  • Luke Geeson
  • Simon Tatham
  • Ties Stuij

Diff Detail

Event Timeline

LukeGeeson created this revision.Mar 12 2020, 9:59 AM
craig.topper added inline comments.Mar 12 2020, 10:56 AM
clang/lib/Basic/Targets/AArch64.cpp
74

Doesn't Bfloat16 have a different number of mantissa and exponent bits than IEEEhalf?

clang/test/CodeGen/arm-mangle-16bit-float.cpp
4 ↗(On Diff #249965)

How can bfloat16 be passed as half? Don't they have a different format?

I don't understand why you wouldn't add a new IR type for this; doing so should be totally mechanical.

jfb added a reviewer: jfb.Mar 12 2020, 11:16 AM

Note that we have an IR type for the PPC double-double format, which isn't even hardware-supported. This is literally just an IEEE floating-point format with non-standard parameters, right?

Note that we have an IR type for the PPC double-double format, which isn't even hardware-supported. This is literally just an IEEE floating-point format with non-standard parameters, right?

Indeed yeah, the details are laid out in a blog post here: https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a

I don't understand why you wouldn't add a new IR type for this; doing so should be totally mechanical.

We had a few reasons for doing it this way.

By adding a new IR type we would need to consider calling conventions, and IR optimizations for what is essentially an opaque storage-only type.

Bfloat has no soft ABI, and all the supported operations can be handled through intrinsics (in a later patch, removed here due to bloat). If we were to add a new IR type, then we would need to handle many operations which would extend beyond what the type is supported for. For instance in GCC we had to add a new mode in RTL to handle inline memcopy.

LukeGeeson marked an inline comment as done.Mar 13 2020, 6:12 AM
LukeGeeson added inline comments.
clang/test/CodeGen/arm-mangle-16bit-float.cpp
4 ↗(On Diff #249965)

see the above comment about soft-abis

I don't understand why you wouldn't add a new IR type for this; doing so should be totally mechanical.

We had a few reasons for doing it this way.

By adding a new IR type we would need to consider calling conventions, and IR optimizations for what is essentially an opaque storage-only type.

IR optimizations should just fall out; the code in APFloat should work for arbitrary FP semantics.

Calling something a "storage-only" type does not get you out of worrying about calling conventions at the ABI level. You can forbid passing your type as an argument or result directly, but structs containing your type as a field can still be passed around, and the behavior for that needs to be defined.

Bfloat has no soft ABI, and all the supported operations can be handled through intrinsics (in a later patch, removed here due to bloat). If we were to add a new IR type, then we would need to handle many operations which would extend beyond what the type is supported for. For instance in GCC we had to add a new mode in RTL to handle inline memcopy.

I don't know why having a soft ABI makes a difference. If the type only works on certain platforms, then it can't be used on other platforms.

Added a file from a downstream cleanup of the branch.

jfb added a comment.Mar 13 2020, 11:57 AM

I don't understand why you wouldn't add a new IR type for this; doing so should be totally mechanical.

We had a few reasons for doing it this way.

By adding a new IR type we would need to consider calling conventions, and IR optimizations for what is essentially an opaque storage-only type.

IR optimizations should just fall out; the code in APFloat should work for arbitrary FP semantics.

Calling something a "storage-only" type does not get you out of worrying about calling conventions at the ABI level. You can forbid passing your type as an argument or result directly, but structs containing your type as a field can still be passed around, and the behavior for that needs to be defined.

Bfloat has no soft ABI, and all the supported operations can be handled through intrinsics (in a later patch, removed here due to bloat). If we were to add a new IR type, then we would need to handle many operations which would extend beyond what the type is supported for. For instance in GCC we had to add a new mode in RTL to handle inline memcopy.

I don't know why having a soft ABI makes a difference. If the type only works on certain platforms, then it can't be used on other platforms.

Having an IR type sounds like the right thing to do here.

stuij added a comment.Mar 18 2020, 4:54 AM

Thanks for the feedback. We'll work on implementing a bfloat IR type.

SjoerdMeijer added a comment.EditedMar 18 2020, 5:29 AM

That sounds good and that seems to address the feedback given here, which I agree with. Just wanted to briefly add that while it already looks like most interested people are on this ticket, perhaps it good to also share the plan and design on the cfe dev list for visibility.

stuij commandeered this revision.Apr 15 2020, 4:28 AM
stuij edited reviewers, added: LukeGeeson; removed: stuij.

commandeering as I've got some changes to push

stuij added a comment.Apr 15 2020, 4:31 AM
This comment was removed by stuij.
stuij updated this revision to Diff 258018.Apr 16 2020, 4:59 AM

changes: depend on bfloat IR type, and resolve some conflicts that crept in

stuij updated this revision to Diff 261851.May 4 2020, 9:43 AM

I've amended this patch to conform to the general Bfloat -> BFloat convention
changes requested in the dependent BFloat IR patch (D78190).

I also folded in two other downstream patches, which allow for the ?: operator
on bf16 values and prohibit conversions between integer and __bf16.

asmith added a subscriber: asmith.May 6 2020, 8:35 PM

really happy for native bfloat support!

clang/include/clang/AST/ASTContext.h
967

Maybe the comment is unnecessary. I can imagine other targets besides ARM NEON using BFloat

clang/include/clang/AST/Type.h
2011

don't think this comment adds anything

clang/lib/AST/ASTContext.cpp
5995

extra spaces

clang/lib/AST/ItaniumMangle.cpp
3188

Add a space before break and on the line above.

stuij marked 2 inline comments as done.May 13 2020, 6:10 AM

@asmith: Thanks for reviewing :) Fixed your suggestions in the coming revision.

clang/include/clang/AST/ASTContext.h
967

fair

clang/include/clang/AST/Type.h
2011

fair

stuij updated this revision to Diff 263719.May 13 2020, 7:44 AM

changes:

  • addressed asmith's comments
  • Bfloat doesn't depend on half cmdline options and correct code is generation for AArch32 hard and softfp CC

I authored some of the code for this patch, please update the commit message with my name on the list :)

majnemer added inline comments.
clang/include/clang-c/Index.h
3258

Should this be:

CXType_LastBuiltin = CXType_BFloat16,
clang/lib/AST/ItaniumMangle.cpp
3188

Why is this x1?

majnemer added inline comments.May 14 2020, 8:27 AM
clang/lib/Sema/SemaOverload.cpp
1873–1874

This probably needs an explanation.

SjoerdMeijer added inline comments.May 14 2020, 8:50 AM
clang/docs/LanguageExtensions.rst
520

Not my field of expertise, and sorry if I've missed this somewhere, but was wondering if this (i.e. all language modes) means we need C++ name mangling support for bfloat? In the AArch64 backend I saw this:

getBFloat16Mangling() const override { return "u6__bf16"; }

But that's something else I guess, and I was guessing a 2 letter mangling needs to be added here?

https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-builtin

stuij marked 4 inline comments as done.May 16 2020, 4:29 PM

@LukeGeeson: you're already mentioned :) See the commits tab on this review. But Simon Tatham needs a mention as well. I shall add him.

clang/docs/LanguageExtensions.rst
520

Yes, we need name-mangling support, and yes that method you mentioned does that. There's one for AArch32 as well. And yes we have documented it in the 'C++ ABI for the latest Arm Architecture' docs:

aarch32: https://developer.arm.com/docs/ihi0041/latest
aarch64: https://developer.arm.com/docs/ihi0059/latest

clang/include/clang-c/Index.h
3258

thanks

clang/lib/AST/ItaniumMangle.cpp
3188

Yes, that does look odd. The original author of this code has left the company, but I'll ask around.

clang/lib/Sema/SemaOverload.cpp
1873–1874

done

stuij updated this revision to Diff 264560.May 18 2020, 2:01 AM

addressing review comments and adding Simon Tatham to contributers

stuij updated this revision to Diff 264562.May 18 2020, 2:04 AM

redo: addressing review comments and adding Simon Tatham to contributers

SjoerdMeijer added inline comments.May 18 2020, 5:38 AM
clang/docs/LanguageExtensions.rst
520

But does that mean that for other targets this is not (yet) supported?

stuij marked an inline comment as done.May 18 2020, 8:25 AM
stuij added inline comments.
clang/docs/LanguageExtensions.rst
520

Currently bfloat is a vendor-extended type, hence the 'u' prefix. So to me it would make sense that backends should implement their own naming, as there's no guarantee that the naming scheme chosen for Arm is compatible with all backends.

SjoerdMeijer added inline comments.May 18 2020, 8:40 AM
clang/docs/LanguageExtensions.rst
520

Ok, true, fair enough

SjoerdMeijer added inline comments.May 18 2020, 9:05 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
8124

Nit: was wondering if err_cast_to_bfloat16 would be more consistent

clang/lib/AST/ASTContext.cpp
6010

nit: perhaps this error message is not entirely accurate for bfloat16?

clang/lib/Basic/Targets/AArch64.cpp
74

Nit: we use bfloat16 everywhere, would llvm::APFloat::BFloat16() be better for consistency?

clang/lib/CodeGen/CodeGenModule.cpp
115

nit: perhaps getBFloat16Ty()

clang/lib/CodeGen/CodeGenTypeCache.h
39

And so here too, BFloat16Ty?

clang/lib/CodeGen/TargetInfo.cpp
5987

Nit: to distinguish the unfortunate float ABI names, I was wondering whether IsFloatABISoftFP is clearer, or something along those lines, just to make clear it is not the "soft" but the "softfp" variant

There are Sema and CodeGen tests, but was wondering if there would be some value in having an AST test too? There are some other types that do have AST tests.

clang/lib/AST/ASTContext.cpp
2054

Is a break missing here?

clang/lib/CodeGen/CodeGenTypes.cpp
307

Is this covered in tests?

clang/lib/CodeGen/TargetInfo.cpp
6599

nit: camelcase fase for haslegalhalf

clang/test/CodeGen/arm-mangle-16bit-float.cpp
4 ↗(On Diff #249965)

Probably I am bit confused too now... are the three possible types that we are expecing not bfloat, i32, or i16?

2 ↗(On Diff #264562)

Do you need to pass +fullfp16?

stuij edited the summary of this revision. (Show Details)May 20 2020, 3:47 AM
RKSimon resigned from this revision.May 21 2020, 1:40 AM
LukeGeeson added inline comments.May 21 2020, 11:59 AM
clang/test/CodeGen/arm-mangle-16bit-float.cpp
4 ↗(On Diff #249965)

Hi Sjoerd, Good spot here I forgot to add a default case somewhere, which means AArch32 didn't observe -mfloat-abi softfp by default - and hence used bfloat where i32 was expected.

I have a local patch for this and will send to Ties to merge into this one :)

We're not sure what you mean by i16 however?

stuij edited the summary of this revision. (Show Details)May 22 2020, 3:10 AM
stuij updated this revision to Diff 265753.May 22 2020, 9:06 AM
stuij marked an inline comment as done.

no explicit float-abi cmdline arg should default to softfp

stuij updated this revision to Diff 265985.May 25 2020, 2:43 AM
stuij marked 14 inline comments as done.

addressed review comments and some related general changes:

  • renamed IsSoftFloatABI -> IsFloatABISoftFP
  • split bfloat tests out of arm-mangle-16bit-float.cpp
  • bfloat shouldn't be checking for NativeHalfType
  • err_cast_(from|to)_bfloat -> err_cast_(from|to)_bfloat16
  • reworded comments and other minor fixes
stuij added inline comments.May 25 2020, 2:48 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
8124

agree

clang/lib/AST/ASTContext.cpp
2054

tnx

6010

tnx

clang/lib/AST/ItaniumMangle.cpp
3188

I'm not sure it's 100% wrong, but it's not consistent with GCC. Changing to bfloat16_t.

clang/lib/Basic/Targets/AArch64.cpp
74

In the IR world bfloat is consistently called bfloat, without the 16. I think this might turn into bikeshedding to justify if either one or both sides of the divide should or should not include the '16'. I can think of arguments to support the various options.

As I think there's no clear winner, I'd like to take the route of less effort and keep things as is.

clang/lib/CodeGen/CodeGenModule.cpp
115

see previous C - IR divide comment

clang/lib/CodeGen/CodeGenTypeCache.h
39

see previous C - IR divide comment

clang/lib/CodeGen/CodeGenTypes.cpp
307

Thanks. That code doesn't make sense to me. We are not expecting to emulate bfloat operations, and we shouldn't conflate bfloat with half. I'll remove the UseNativeHalf clause, and the one call-site of this fn.

clang/lib/CodeGen/TargetInfo.cpp
5987

Yes, good idea. Done.

clang/test/CodeGen/arm-mangle-16bit-float.cpp
2 ↗(On Diff #264562)

I'm splitting out bf16 and fp16 tests. I don't think we should be testing with the union of cmdline options to accommodate testing multiple types in one file.

As I wrote in one of the inlined comments, I am relatively unhappy that we have both bfloat and bfloat16 as names. But that looks like a complain for another patch, and not really this one.

clang/lib/Basic/Targets/AArch64.cpp
74

ah, I couldn't remember on which patch I commented on the bfloat16 vs bfloat naming.
Since everything is called bfloat16, from the architecture extension to the source language type, I find this a bit of a missed opportunity to get the names consistent. And I wouldn't say that naming of types is bikeshedding. But maybe that's just me...

clang/lib/Basic/Targets/ARM.cpp
80

Is this tested? Can it be tested?

clang/lib/CodeGen/CGBuiltin.cpp
4484

nit: the linter says some spaces here.

5517–5518

nit: looks like the hinter right to be unhappy about the formatting

clang/lib/CodeGen/TargetInfo.cpp
6279

nit: ir -> IR

6480

nit: perhaps add TODO, or just remove this.

clang/lib/Sema/SemaOverload.cpp
1874

Nit: conversions -> Conversions, permitted -> permitted.

Same on line no. 1895.

clang/test/CodeGen/arm-bf16-params-returns.c
6

what happens with -mfloat-abi=soft. Does that deserve a test?

clang/test/CodeGen/arm-mangle-16bit-float.cpp
4 ↗(On Diff #249965)

Ah, as I said, got confused. Now I see we use i16 for targets that don't support bfloat16. So we probably need a test for that.

clang/test/CodeGen/arm-mangle-bf16.cpp
4

nit: if the name mangling is Independent of the float ABI, then you might as well one runline and the abi option.

SjoerdMeijer added inline comments.May 27 2020, 12:28 PM
clang/test/CodeGen/arm-mangle-bf16.cpp
4

-> ... then you might as well have only one runline and remove the abi option.

stuij marked 9 inline comments as done.May 29 2020, 7:59 AM
stuij added inline comments.
clang/lib/Basic/Targets/AArch64.cpp
74

In general naming isn't bikeshedding. And I appreciate discussions on naming. I'm just afraid it might turn into bikeshedding. I do agree that such a remark isn't a very constructive to a conversation. Sorry.

clang/test/CodeGen/arm-bf16-params-returns.c
6

Yes, this one is interesting. I think we shouldn't support bfloat at all in combination with -mfloat-abi=soft. We don't support software emulation of bfloat instructions and all operations on bfloat are simd instructions.

It turns out cc1 will accept -mfloat-abi=soft with neon intrinsics, which will happily churn out neon instructions. This doesn't sound very soft. The driver will ignore -mfloat-abi=soft in certain combinations of cmdline instructions, but I haven't delved deep enough to know what's what.

GCC doesn't allow soft+neon combination. Unfortunately it will actually crash for just a bfloat type by itself, which is quite useless without intrinsics. The Arm GCC folks will raise a ticket on this with as proposed solution to not allow this combination.

As this issue seems bigger than just bfloat, and potentially there's driver code involved as well I thought it'd make sense to handle this in a separate patch.

clang/test/CodeGen/arm-mangle-16bit-float.cpp
4 ↗(On Diff #249965)

As I wrote above, I think we currently should not allow it as long as there are no targets that support bfloat and have use for this combination.

clang/test/CodeGen/arm-mangle-bf16.cpp
4

yes, I thought it's better to be defensive here, in case the implementation changes.

SjoerdMeijer added inline comments.Jun 1 2020, 12:58 AM
clang/test/CodeGen/arm-bf16-params-returns.c
6

I think we first need agreement what -mfloat-abi=soft with bf16 means and how it should behave, document this, and have some tests. Possibly document how we diverge from this.

I think I tend to disagree with this:

I think we shouldn't support bfloat at all in combination with -mfloat-abi=soft.

why would it not be supported in some way (promotions to another type, or even library calls), like the other float-types?

It turns out cc1 will accept -mfloat-abi=soft with neon intrinsics, which will happily churn out neon instructions.

I would say the fact that there are other problems, shouldn't distract us too much from trying to get this right; I think at this point that is not yet a justification.

As this issue seems bigger than just bfloat, and potentially there's driver code involved as well I thought it'd make sense to handle this in a separate patch.

I think at this point I disagree with this, mainly because of my first point: the behaviour should be specified. I would also say that not doing this, could be a bit of bad precedent for adding a new C type.

labrinea added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
4495

This is not what we should be checking for here. Imagine a command line with +bf16 and +mfloat-abi=softfp, that should generate i16, not bfloat. We therefore need a target hook to pass this information. I suggest allowBFloatArgsAndRet() in ABIInfo, returning false by default, overloaded for Arm and AArch64 to return !IsFloatABISoftFP && hasBFloat16Type() and hasBFloat16Type() respectively.

stuij marked 2 inline comments as done.Jun 1 2020, 6:38 AM
stuij added inline comments.
clang/lib/Basic/Targets/ARM.cpp
80

Yes, I think it's a bit tricky to test this all by itself. Right now this is used when return a TypeInfo, which is used in all kinds of different contexts. I think alignment is better tested where it is eventually expressed.

Can you summarise where we are? I.e.,

  • float-abi=soft doesn't work. But what is the problem? Are we not simply passing i16s, is that not what we are supposed to do?

Can you also update the description of this patch, I got totally confused by:

  • "introduces an opaque, storage-only C-type __bf16, which does not introduce a new LLVM IR type, but maps it to either i16 or half type."
clang/test/CodeGen/arm-mangle-16bit-float.cpp
1 ↗(On Diff #265985)

This is testing name mangling of __fp16, so is not related to this patch, please commit this separately if this is not tested yet. But I do suspect Dh is tested already, but I could be wrong.

stuij marked an inline comment as done.Jun 1 2020, 7:48 AM

Can you summarise where we are? I.e.,

  • float-abi=soft doesn't work. But what is the problem? Are we not simply passing i16s, is that not what we are supposed to do?

At the moment when going through the GCC compatibility driver (standard interface), we get __bf16 is not supported on this target.
When using -cc1, we can pass -mfloat-abi=soft, we can compile source-code that contains __bf16, and we can also compile neon intrinsics in general. It looks like we just ignore soft was passed and instead we use the hard float-abi. For intrinsics we should error.

In general we pass i32's btw, not i16's. Also for __fp16.

Can you also update the description of this patch, I got totally confused by:

  • "introduces an opaque, storage-only C-type __bf16, which does not introduce a new LLVM IR type, but maps it to either i16 or half type."

willdo

stuij edited the summary of this revision. (Show Details)Jun 1 2020, 7:49 AM
SjoerdMeijer added a comment.EditedJun 1 2020, 8:39 AM

At the moment when going through the GCC compatibility driver (standard interface), we get __bf16 is not supported on this target.

If this is the current behaviour, and consistent with GCC, that sounds reasonable. Probably best to be explicit about this and document this in LangRef. With a note that this is subject to change?
Oh, and I guess that means we can add a test this for this? We can check this error message?

stuij added a comment.Jun 1 2020, 9:17 AM

At the moment when going through the GCC compatibility driver (standard interface), we get __bf16 is not supported on this target.

If this is the current behaviour, and consistent with GCC, that sounds reasonable.

Unfortunately, this is not consistent with GCC. GCC will happily compile.

stuij updated this revision to Diff 268335.Jun 3 2020, 5:01 PM

Addressed review comments. Notably what to do in combination with -mfloat-abi=soft.

SjoerdMeijer accepted this revision.Jun 4 2020, 12:57 AM

Thanks, LGTM

This revision is now accepted and ready to land.Jun 4 2020, 12:57 AM
labrinea added inline comments.Jun 4 2020, 2:51 AM
clang/lib/CodeGen/CGBuiltin.cpp
6989

shouldn't this be getTargetHooks().getABIInfo().allowBFloatArgsAndRet() ?

stuij marked 2 inline comments as done.Jun 4 2020, 4:08 PM
stuij added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
6989

thanks, done

stuij updated this revision to Diff 268614.Jun 4 2020, 4:09 PM
stuij marked an inline comment as done.

Resolved merge conflicts with head. Fixed minor oversight.

This revision was automatically updated to reflect the committed changes.