Page MenuHomePhabricator

[X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake

Authored by liutianle on Apr 10 2019, 6:43 PM.


  1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
  2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.

VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data.
VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data.
VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision.
For more details about BF16 isa, please refer to the latest ISE document:

Diff Detail

Event Timeline

liutianle created this revision.Apr 10 2019, 6:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2019, 6:43 PM
craig.topper retitled this revision from Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake to [X86] Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake.Apr 10 2019, 7:13 PM
craig.topper added reviewers: RKSimon, spatel.
liutianle updated this revision to Diff 194634.Apr 10 2019, 8:47 PM

Add test files to decode.

@RKSimon, I did a review on this internally before it was posted. Do you mind looking over it? Thanks.

Some initial thoughts - I don't know a lot about the bfloat16 instructions so need to read up when I get the chance.


Is there no way around this - other conversions don't need this.


Bit more description would be good if possible - those enums look very similar at first glance!

craig.topper added inline comments.Apr 17 2019, 2:04 PM

I believe a lot of our masked conversion intrinsics never got their masking separated out to select in IR. So they haven't encountered this issue yet. I did put in MCVT* ISD opcodes for the older masked conversions to fix PR34877 a few months ago.

I do wonder if in a future state with strict FP support if we should keep the masking as part of all the floating point intrinsics.

liutianle updated this revision to Diff 196166.Apr 22 2019, 6:48 PM
liutianle marked an inline comment as done.Apr 22 2019, 6:52 PM
liutianle added inline comments.

@RKSimon , I update it. Please review again.

craig.topper added inline comments.Apr 23 2019, 9:56 PM

Mention the difference between CVTNE2PS2BF16 and CVTNEPS2BF16. i.e. that CVTNE2PS2BF16 compresses two vectors to one.


Separate DPBF16PS on to its own line with its own description below MCVTNEPS2BF16


presision -> precision

@RKSimon , @craig.topper , I updated it. Please review again.

@RKSimon does this look ok to you other than the typos?





RKSimon added inline comments.Apr 29 2019, 1:02 PM

Is SchedWriteVecALU a realistic scheduler class? Its typically used for vector integer add/sub/and/bitops

liutianle updated this revision to Diff 198165.May 4 2019, 11:34 PM
This revision is now accepted and ready to land.May 5 2019, 10:13 AM
This revision was automatically updated to reflect the committed changes.