This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Basic bfloat support
ClosedPublic

Authored by labrinea on Jun 8 2020, 4:12 AM.

Details

Summary

This patch is part of a series that adds support for the Bfloat16 extension of the Armv8.6-a architecture, as detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
Specifically:

  • it adds the bfloat scalar and vector types in the necessary register classes,
  • it adjusts the calling convention to cope with bfloat argument passing and return,
  • it adds codegen patterns for moves, loads and stores relying on fullfp16.

It's tested mostly by the intrinsic patches that depend on it (load/store, convert/copy).

The bfloat type, and its properties are specified in the Arm Architecture Reference Manual:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The following people contributed to this patch:

  • Alexandros Lamprineas
  • Ties Stuij

Diff Detail

Event Timeline

labrinea created this revision.Jun 8 2020, 4:12 AM

I believe the codegen patterns for vmov and load/store half are incorrect on the bf16 type. Can someone suggest what is the right approach?

Since this patch adds loads, stores, moves and the calling convention, couldn't those be tested here, without waiting for intrinsics?

What do you think is wrong about those patterns? The VLDRH/VSTRH instructions load/store a 16-bit value between memory and floating-point registers, which should work regardless of the type of the value.

Hey Oliver, thanks for looking at this.

Since this patch adds loads, stores, moves and the calling convention, couldn't those be tested here, without waiting for intrinsics?

Sure, I could add a couple of tests in this revision.

What do you think is wrong about those patterns? The VLDRH/VSTRH instructions load/store a 16-bit value between memory and floating-point registers, which should work regardless of the type of the value.

I had the impression that because of the different format between fp16 and bfloat, it would be wrong to use those instructions on bfloat. Maybe I misunderstood. By the way, the Architecture reference says that the instructions are undefined if the fp16 extension is not supported.

momchil.velikov resigned from this revision.Jun 8 2020, 7:15 AM

Hmm. That sounds like a pain.

Can you split out the (f16 HPR:$Sm) into a separate patch and commit the independantly? That would simplify this up quite a bit.

labrinea updated this revision to Diff 270178.Jun 11 2020, 10:26 AM
labrinea added inline comments.Jun 11 2020, 10:36 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
2093

Actually, these assertions can go wrong if you are lowering one of the two value types with both the Architecture extensions enabled.

dmgreen added inline comments.Jun 12 2020, 11:15 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
5901

According to D81411, you can have bf16 without having fp16. And so you don't have any of the instructions like VMOV.f16 (which a VMOVrh will turn into).

Same goes for the vldr.16 int he test below. Because +fp16 isn't specified, we might have to awkwardly use some other set of instructions. It will be more efficient to use vmov.16 and vldr.16 if they are available, but if they are not we might have to fall back to something else.

Or we say that combination isn't supported, but it seems that fp16 is still optional and bf16 is mandatory in 8.6.

labrinea marked an inline comment as done.Jun 17 2020, 10:35 AM
labrinea added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
5901

Good point. I am going to alter these checks to only guard fullfp16 for now. As the title suggests this is basic support, so I think it's fair to only support the bf16+fullfp16 combination in this revision. I will make sure it is explicitly stated in the commit message.

labrinea updated this revision to Diff 271413.Jun 17 2020, 11:04 AM
labrinea retitled this revision from [WIP] Basic bfloat support on Arm to [ARM] Basic bfloat support.
labrinea edited the summary of this revision. (Show Details)

Changes from last revision:

  • the code generation relies on fullfp16 being present,
  • the unit test also checks the codegen for soft float abi
dmgreen added inline comments.Jun 17 2020, 11:57 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
731

We only add the FP16 regclass if we have HasFullFP16. Not if we just have the vcvt instructions (HasFP16). I agree it is probably good to make bf16 a legal type if we can, but a lot of operations will not be supported. As far as I understand they are not expected to work, and there will be nothing that can promote them at the moment?

I'm not sure how the AArch64 backend handled it, but do we need something like setAllExpand(..)? Honestly I would expect it to still break if you tried to add two bfloats in IR, but it might be more future-proof.

5081

This controls some things like setcc lowering. I wouldn't expect them to be relevant for bfloat if it is essentially just a storage type.

5901

Sounds fair. I would expect this to be the most common combination, so is good to tackle first.

llvm/lib/Target/ARM/ARMInstrVFP.td
166

I think these patterns should still have let Predicates = [HasFPRegs16] in around them, like we do for all the NEON or MVE patterns.

llvm/test/CodeGen/ARM/bfloat.ll
10

These CHECK lines are left over.

labrinea marked 4 inline comments as done.Jun 18 2020, 1:43 AM
labrinea added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
731

Maybe for now it'd be better to add the bfloat type if both fullfp16 and bf16 are present. Makes sense?

5081

fair enough, I'll remove it

llvm/lib/Target/ARM/ARMInstrVFP.td
166

I'll create a predicated pattern

llvm/test/CodeGen/ARM/bfloat.ll
10

Oops, didn't notice. I'll remove them.

dmgreen added inline comments.Jun 18 2020, 7:02 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
731

We will need to fix bfloat + nofullfp16 anyway, and so I'm not sure it will make a lot of difference in the short term to make this fullfp16.

My point is that there will be a lot of operations - like adding two bfloats, or extracting from a bfloat vector or concating two bfloat vectors. In the long run we are going to make sure all those things work and don't crash during legalization. At the moment I expect a lot of them to break. For this patch it is probably OK to ignore that for the moment. But it should be something we look at eventually.

labrinea updated this revision to Diff 271708.Jun 18 2020, 7:15 AM

Addressed last round's review comments.

dmgreen accepted this revision.Jun 18 2020, 7:45 AM

Thanks. LGTM. We need to follow up on getting this working without fullfp16, but I think it's good to get this combo going first.

This revision is now accepted and ready to land.Jun 18 2020, 7:45 AM
stuij added a subscriber: stuij.Jun 18 2020, 7:56 AM
stuij added inline comments.
llvm/lib/Target/ARM/ARMISelLowering.cpp
731

Yes, we are not supporting the full range yet, either on AArch32 or AArch64. Adding more support is a todo. We do have follow-up patches on phab that add support for extraction/insertion, concatenation, and a number of others.

One nice thing is that since all the bfloat operations go through intrinsics (when using C/C++), the surface area of possible operations is greatly reduced and known. Currently on AArch64, the IR generated from those intrinsics is the input for testing the SelDag lowering (like we do for half as well).

dmgreen added inline comments.Jun 18 2020, 8:16 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
731

Yeah OK. I would worry about the optimizer producing weird and wonderful variants on the input, so you may find a few more are needed. But hopefully most of that will be covered with what we have.

stuij added inline comments.Jun 18 2020, 8:43 AM
llvm/lib/Target/ARM/ARMISelLowering.cpp
731

Yea, I do worry too :)

This revision was automatically updated to reflect the committed changes.