Page MenuHomePhabricator

[ARM][BFloat] Lowering of create/get/set/dup intrinsics
ClosedPublic

Authored by miyuki on Jun 8 2020, 10:07 AM.

Details

Summary

This patch adds codegen for the following BFloat
operations to the ARM backend:

  • concatenation of bf16 vectors
  • bf16 vector element extraction
  • bf16 vector element insertion
  • duplication of a bf16 value into each lane of a vector
  • duplication of a bf16 vector lane into each lane

Diff Detail

Event Timeline

miyuki created this revision.Jun 8 2020, 10:07 AM

Returns look wrong, but they should be fixed in the calling convention patch

Yeah it's hard to tell what parts of the tests are from bad calling conventions and which are not. If you need this quicker than the other part is available (and loads/stores work), you could loads/store the bfloat, as opposed directly returning the value.

llvm/lib/Target/ARM/ARMInstrNEON.td
6503

Does VMOVRH require fullfp16? Am I right in saying that bfloat doesn't require the set of instructions we put into HasFPRegs16? That sounds like a pain.

llvm/test/CodeGen/ARM/bf16-create-get-set-dup.ll
90

Can you switch the operands here to show it doing something.

miyuki marked an inline comment as done.Jun 9 2020, 4:53 AM
miyuki added inline comments.
llvm/lib/Target/ARM/ARMInstrNEON.td
6503

VMOVRH requires HasFPRegs16 and VMOVH requires HasFullFP16. But you are right, both instructions require the same extension in ARMARM (HaveFP16Ext), and the HaveAArch32BF16Ext extension is a separate thing. We need separate patterns for cases when BF16 is enabled but FP16 is not.

miyuki updated this revision to Diff 269551.Jun 9 2020, 8:22 AM
  • Addressed reviewer's comments
  • Rebased on top of a patch which implements the correct calling convention for returning bfloat values
miyuki marked 2 inline comments as done.Jun 9 2020, 8:23 AM
dmgreen accepted this revision.Jun 9 2020, 11:38 AM

Sounds good to me. Thanks

This revision is now accepted and ready to land.Jun 9 2020, 11:38 AM
miyuki updated this revision to Diff 272032.Jun 19 2020, 5:47 AM

Rebased. Added a workaround for bf16 arguments and returns (lowering currently works only with +fullfp16).

This revision was automatically updated to reflect the committed changes.