Enable FP16 complex FMA instructions.
"b" means rounding. Right?
Sorry, I didn't find the constrain in the spec.
Can swap LHS and RHS reduce some redundant code?
The lambda seems only be called once.
Is it possible fast and non-fast instruction is mixed due to inline? Shall we check the instruction AllowContract flag?
Merge it to previous line.
Moving ClobberConstraint before IsCommutable saves the code for default value?
The name seems not accurate. Is it cfmop for mul and cfmaop for fma?
I didn't see this flag for other scalar instructions, why we need it for complex instruction?
Why FR32X version is not needed for complex scalar instructions?
#UD if (dest_reg == src1_reg) or ( dest_reg == src2_reg)
Because all complex instructions have constrains "dst != src1 && dst != src2". We use earlyclobber to avoid the dst been assigned to src1 or src2.
No, I mean we have both X86::XXX and X86::XXX_Int for other instructions. One is FR16X which can be unfolded, one is VR128X which can't. For example, VFNMADD213SHZm and VFNMADD213SHZm_Int.
The VFCMULCSHZrr instructions produce two 16-bit values packed into the lower 32 bits. That would mean we would need a FR32X result, but it couldn't interact meaningfully with any other FR32X instruction since its really two values.
I think we only have FR32/FR64 instructions for things that have generic IR equivalents or that we create from other generic IR operations. Like I think we have an FR32 RCP and RSQRT because we can convert float div or 1/sqrt to them.
Sorry, my mistake. Here b supposes to represent EVEX.b bit in the encoding.