This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
ClosedPublic

Authored by liutianle on Apr 10 2019, 6:43 PM.

Details

Summary
  1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
  2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.

VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Diff Detail

Event Timeline

liutianle created this revision.Apr 10 2019, 6:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2019, 6:43 PM
craig.topper retitled this revision from Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake to [X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake.Apr 10 2019, 7:13 PM
craig.topper added reviewers: RKSimon, spatel.
liutianle updated this revision to Diff 194634.Apr 10 2019, 8:47 PM

Add test files to decode.

@RKSimon, I did a review on this internally before it was posted. Do you mind looking over it? Thanks.

Some initial thoughts - I don't know a lot about the bfloat16 instructions so need to read up when I get the chance.

include/llvm/IR/IntrinsicsX86.td
4851

Is there no way around this - other conversions don't need this.

lib/Target/X86/X86ISelLowering.h
513

Bit more description would be good if possible - those enums look very similar at first glance!

craig.topper added inline comments.Apr 17 2019, 2:04 PM
include/llvm/IR/IntrinsicsX86.td
4851

I believe a lot of our masked conversion intrinsics never got their masking separated out to select in IR. So they haven't encountered this issue yet. I did put in MCVT* ISD opcodes for the older masked conversions to fix PR34877 a few months ago.

I do wonder if in a future state with strict FP support if we should keep the masking as part of all the floating point intrinsics.

liutianle updated this revision to Diff 196166.Apr 22 2019, 6:48 PM
liutianle marked an inline comment as done.Apr 22 2019, 6:52 PM
liutianle added inline comments.
lib/Target/X86/X86ISelLowering.h
513

@RKSimon , I update it. Please review again.

craig.topper added inline comments.Apr 23 2019, 9:56 PM
lib/Target/X86/X86ISelLowering.h
512

Mention the difference between CVTNE2PS2BF16 and CVTNEPS2BF16. i.e. that CVTNE2PS2BF16 compresses two vectors to one.

513

Separate DPBF16PS on to its own line with its own description below MCVTNEPS2BF16

513

presision -> precision

@RKSimon , @craig.topper , I updated it. Please review again.

@RKSimon does this look ok to you other than the typos?

lib/Target/X86/X86ISelLowering.h
515

dingle->single

lib/Target/X86/X86Subtarget.h
356

extenstions->extensions

RKSimon added inline comments.Apr 29 2019, 1:02 PM
lib/Target/X86/X86InstrAVX512.td
12674

Is SchedWriteVecALU a realistic scheduler class? Its typically used for vector integer add/sub/and/bitops

liutianle updated this revision to Diff 198165.May 4 2019, 11:34 PM
This revision is now accepted and ready to land.May 5 2019, 10:13 AM
This revision was automatically updated to reflect the committed changes.