add support for FMA with Rounding mode:
{rn-sae} Round to nearest (even) + SAE
{rd-sae} Round down (toward -inf) + SAE
{ru-sae} Round up (toward +inf) + SAE
{rz-sae} Round toward zero (Truncate) + SAE
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Tests?
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
16959 | if RoundingMode is not constant it should give something like llvm_unreachable | |
lib/Target/X86/X86ISelLowering.h | ||
377 | It should not be FMA specific, it is for all FP instructions | |
lib/Target/X86/X86InstrAVX512.td | ||
3602 | Just add rrb form to PSZ and PDZ: multiclass avx512_fma3p<bits<8> opc213, bits<8> opc231, string OpcodeStr, SDPatternOperator OpNode> { let ExeDomain = SSEPackedSingle in { defm NAME##PSZ : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v16f32_info, OpNode>, avx512_fma3_rm_rrb<opc213, opc231, OpcodeStr, v16f32_info, OpNode>, EVEX_V512; defm NAME##PSZ256 : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v8f32x_info, OpNode>, EVEX_V256; defm NAME##PSZ128 : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v4f32x_info, OpNode>, EVEX_V128; } let ExeDomain = SSEPackedDouble in { defm NAME##PDZ : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v8f64_info, OpNode>, avx512_fma3_rm_rrb<opc213, opc231, OpcodeStr, v16f32_info, OpNode>, EVEX_V512, VEX_W; defm NAME##PDZ256 : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v4f64x_info, OpNode>, EVEX_V256, VEX_W; defm NAME##PDZ128 : avx512_fma3p_forms<opc213, opc231, OpcodeStr, v2f64x_info, OpNode>, EVEX_V128, VEX_W; } } |
LGTM.
Is it possible to mod INTR_TYPE_1OP_MASK_RM and INTR_TYPE_SCALAR_MASK_RM to use the new X86ISD::ROUNDMODE node? This could be done under a separate patch.
lib/Target/X86/X86ISelLowering.h | ||
---|---|---|
374 | Hmm, has this been completely thought through? I may have missed the discussion... Looks like you're introducing a new node that if it surrounds an FMA op it changes its rounding mode? What happens if the two nodes get separated by some transformation? | |
lib/Target/X86/X86InstrAVX512.td | ||
3541–3542 | I think that the EVEX_CD8 thing can be written as VTI.CD8TupleForm. Same later. |
Hi Adam,
We'd like to submit this code and proceed.
The goal is to let setting rounding mode in intrinsics. The operations that we talk about are FP arithmetic - like FADD, FMUL, FSUB, FDIV, FMA - 512 bit vector only.
And FP conversions. Nothing more, so there is no correlation with masks.
I see 2 options right now.
- define an additional X86 node type for each intrinsic
- Wrap the existing node with ROUNDMODE
In the patch we are giving solution number 2 and say that it is safe. If ROUNDMODE and FADD will be separated, the compilation will fail with "cannot select".
Do you still have any concerns?
Thank you for reviewing this.
Say something like:
(AVX512OpWithRounding fadd, op1, op2, <rounding mode, ...>)
We proposed the following, and there is no need to reinvent the wheel
Option 2: (AVX512OpWithRounding (fadd, op1, op2,..), <rounding mode>)
And less preferable, in my opinion
Option 1: (AVX512OpWithRounding_fadd, op1, op2,.. <rounding mode>)
- Elena
Some minor comments.
lib/Target/X86/X86InstrAVX512.td | ||
---|---|---|
3547 | There are trailing spaces in this hunk. And I feel it should not be between avx512_fma3p_forms and avx512_fma3p_rm but before or after. | |
3583 | Is a new line intended? | |
test/CodeGen/X86/avx512-fma-intrinsics.ll | ||
248 | It seems to me that this is exactly the same test as "test_x86_vfmadd_ps_z". Shouldn't you just update the check in the existing tests? |
Hmm, has this been completely thought through? I may have missed the discussion...
Looks like you're introducing a new node that if it surrounds an FMA op it changes its rounding mode?
What happens if the two nodes get separated by some transformation?