This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] - Add FMA instruction with Rounding mode
Needs ReviewPublic

Authored by AsafBadouh on Jan 4 2015, 3:29 AM.

Details

Summary

add support for FMA with Rounding mode:
{rn-sae} Round to nearest (even) + SAE
{rd-sae} Round down (toward -inf) + SAE
{ru-sae} Round up (toward +inf) + SAE
{rz-sae} Round toward zero (Truncate) + SAE

Diff Detail

Repository
rL LLVM

Event Timeline

AsafBadouh updated this revision to Diff 17774.Jan 4 2015, 3:29 AM
AsafBadouh retitled this revision from to [AVX-512] - Add FMA instruction with Rounding mode.
AsafBadouh updated this object.
AsafBadouh edited the test plan for this revision. (Show Details)
AsafBadouh added a reviewer: delena.
AsafBadouh set the repository for this revision to rL LLVM.
delena edited edge metadata.Jan 4 2015, 4:19 AM

Tests?

lib/Target/X86/X86ISelLowering.cpp
16959

if RoundingMode is not constant it should give something like llvm_unreachable

lib/Target/X86/X86ISelLowering.h
378

It should not be FMA specific, it is for all FP instructions

lib/Target/X86/X86InstrAVX512.td
3601

Just add rrb form to PSZ and PDZ:

multiclass avx512_fma3p<bits<8> opc213, bits<8> opc231,

string OpcodeStr,
SDPatternOperator OpNode> {

let ExeDomain = SSEPackedSingle in {

  defm NAME##PSZ      : avx512_fma3p_forms<opc213, opc231, OpcodeStr,  v16f32_info, OpNode>,
                                      avx512_fma3_rm_rrb<opc213, opc231, OpcodeStr,  v16f32_info, OpNode>, EVEX_V512;
  defm NAME##PSZ256   : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v8f32x_info, OpNode>, EVEX_V256;
  defm NAME##PSZ128   : avx512_fma3p_forms<opc213, opc231, OpcodeStr,  v4f32x_info, OpNode>, EVEX_V128;
}

let ExeDomain = SSEPackedDouble in {

  defm  NAME##PDZ     : avx512_fma3p_forms<opc213, opc231, OpcodeStr,  v8f64_info, OpNode>, 
                                      avx512_fma3_rm_rrb<opc213, opc231, OpcodeStr,  v16f32_info, OpNode>, EVEX_V512, VEX_W;
  defm  NAME##PDZ256  : avx512_fma3p_forms<opc213, opc231, OpcodeStr,
                                           v4f64x_info, OpNode>, EVEX_V256, VEX_W;
  defm  NAME##PDZ128  : avx512_fma3p_forms<opc213, opc231, OpcodeStr,
                                           v2f64x_info, OpNode>, EVEX_V128, VEX_W;
}

}

AsafBadouh updated this revision to Diff 17777.Jan 4 2015, 7:55 AM
AsafBadouh edited edge metadata.

Add the missing tests
apply Elena's comments.

delena added inline comments.Jan 4 2015, 11:40 PM
lib/Target/X86/X86InstrAVX512.td
3562

I think, you don't need to make rrb for all 3 forms. You can remove 132 and do not complicate the code.

3575

Please, use "round" instead of "rm", because "rm" means register-memory in all other patterns

AsafBadouh updated this revision to Diff 17783.Jan 5 2015, 12:45 AM

rm->round
remove 132 version

Good! just fix 80 chars line in source files and send to the community

AsafBadouh updated this revision to Diff 17787.Jan 5 2015, 1:38 AM
AsafBadouh added a subscriber: Unknown Object (MLST).

LGTM.

Is it possible to mod INTR_TYPE_1OP_MASK_RM and INTR_TYPE_SCALAR_MASK_RM to use the new X86ISD::ROUNDMODE node? This could be done under a separate patch.

anemet added inline comments.Jan 6 2015, 5:04 PM
lib/Target/X86/X86ISelLowering.h
374

Hmm, has this been completely thought through? I may have missed the discussion...

Looks like you're introducing a new node that if it surrounds an FMA op it changes its rounding mode?

What happens if the two nodes get separated by some transformation?

lib/Target/X86/X86InstrAVX512.td
3541–3542

I think that the EVEX_CD8 thing can be written as VTI.CD8TupleForm.

Same later.

AsafBadouh updated this revision to Diff 17893.Jan 8 2015, 4:53 AM
AsafBadouh added reviewers: nadav, chandlerc, halfdan.

Hi Adam,

We'd like to submit this code and proceed.
The goal is to let setting rounding mode in intrinsics. The operations that we talk about are FP arithmetic - like FADD, FMUL, FSUB, FDIV, FMA - 512 bit vector only.
And FP conversions. Nothing more, so there is no correlation with masks.

I see 2 options right now.

  1. define an additional X86 node type for each intrinsic
  2. Wrap the existing node with ROUNDMODE

In the patch we are giving solution number 2 and say that it is safe. If ROUNDMODE and FADD will be separated, the compilation will fail with "cannot select".
Do you still have any concerns?

Thank you for reviewing this.

Say something like:

(AVX512OpWithRounding fadd, op1, op2, <rounding mode, ...>)

We proposed the following, and there is no need to reinvent the wheel

Option 2: (AVX512OpWithRounding (fadd, op1, op2,..), <rounding mode>)
And less preferable, in my opinion
Option 1: (AVX512OpWithRounding_fadd, op1, op2,.. <rounding mode>)

  • Elena

Some minor comments.

lib/Target/X86/X86InstrAVX512.td
3547

There are trailing spaces in this hunk.

And I feel it should not be between avx512_fma3p_forms and avx512_fma3p_rm but before or after.

3583

Is a new line intended?

test/CodeGen/X86/avx512-fma-intrinsics.ll
248

It seems to me that this is exactly the same test as "test_x86_vfmadd_ps_z". Shouldn't you just update the check in the existing tests?

I fixed, committed revision 233528.