Page MenuHomePhabricator

[X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake
ClosedPublic

Authored by liutianle on Apr 10 2019, 6:43 PM.

Details

Summary
  1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
  2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.

VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Diff Detail

Repository
rL LLVM

Event Timeline

liutianle created this revision.Apr 10 2019, 6:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2019, 6:43 PM
craig.topper retitled this revision from Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake to [X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake.Apr 10 2019, 7:13 PM
craig.topper added reviewers: RKSimon, spatel.
liutianle updated this revision to Diff 194634.Apr 10 2019, 8:47 PM

Add test files to decode.

@RKSimon, I did a review on this internally before it was posted. Do you mind looking over it? Thanks.

Some initial thoughts - I don't know a lot about the bfloat16 instructions so need to read up when I get the chance.

include/llvm/IR/IntrinsicsX86.td
4833 ↗(On Diff #194634)

Is there no way around this - other conversions don't need this.

lib/Target/X86/X86ISelLowering.h
513 ↗(On Diff #194634)

Bit more description would be good if possible - those enums look very similar at first glance!

craig.topper added inline comments.Apr 17 2019, 2:04 PM
include/llvm/IR/IntrinsicsX86.td
4833 ↗(On Diff #194634)

I believe a lot of our masked conversion intrinsics never got their masking separated out to select in IR. So they haven't encountered this issue yet. I did put in MCVT* ISD opcodes for the older masked conversions to fix PR34877 a few months ago.

I do wonder if in a future state with strict FP support if we should keep the masking as part of all the floating point intrinsics.

liutianle updated this revision to Diff 196166.Apr 22 2019, 6:48 PM
liutianle marked an inline comment as done.Apr 22 2019, 6:52 PM
liutianle added inline comments.
lib/Target/X86/X86ISelLowering.h
513 ↗(On Diff #194634)

@RKSimon , I update it. Please review again.

craig.topper added inline comments.Apr 23 2019, 9:56 PM
lib/Target/X86/X86ISelLowering.h
513 ↗(On Diff #194634)

Separate DPBF16PS on to its own line with its own description below MCVTNEPS2BF16

512 ↗(On Diff #196166)

Mention the difference between CVTNE2PS2BF16 and CVTNEPS2BF16. i.e. that CVTNE2PS2BF16 compresses two vectors to one.

513 ↗(On Diff #196166)

presision -> precision

@RKSimon , @craig.topper , I updated it. Please review again.

@RKSimon does this look ok to you other than the typos?

lib/Target/X86/X86ISelLowering.h
515 ↗(On Diff #196579)

dingle->single

lib/Target/X86/X86Subtarget.h
356 ↗(On Diff #196579)

extenstions->extensions

RKSimon added inline comments.Apr 29 2019, 1:02 PM
lib/Target/X86/X86InstrAVX512.td
12531 ↗(On Diff #196579)

Is SchedWriteVecALU a realistic scheduler class? Its typically used for vector integer add/sub/and/bitops

liutianle updated this revision to Diff 198165.May 4 2019, 11:34 PM
This revision is now accepted and ready to land.May 5 2019, 10:13 AM
This revision was automatically updated to reflect the committed changes.