Enable FP16 FMA instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16vl-fma.ll | ||
---|---|---|
194–195 | I was just scanning through this patch and noticed the capitalization mismatch on these lines and others. This test has no valid checks as written? |
llvm/test/CodeGen/X86/stack-folding-fp-avx512fp16vl-fma.ll | ||
---|---|---|
194–195 | Good catch! Yes, these two tests were manually written and lit doesn't report fail for such mismatch. |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
2429 | Add outer brackets to all the defines to prevent precedence issues: #define _mm512_fmadd_round_ph(A, B, C, R) \ ((__m512h) __builtin_ia32_vfmaddph512_mask( \ (__v32hf)(__m512h)(A), (__v32hf)(__m512h)(B), (__v32hf)(__m512h)(C), \ (__mmask32)-1, (int)(R))) |
Rebased.
Add extra parentheses for macro.
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
2429 | Thanks Simon. |
clang/include/clang/Basic/BuiltinsX86.def | ||
---|---|---|
2010 | Can we arrange the vfmaddph variant together? Move it to line 1997? | |
2014 | What does "3" stand for? | |
clang/lib/Headers/avx512vlfp16intrin.h | ||
1385 | Sorry, I'm confused sometimes we use mask builtin, sometimes we use select builtin. Any guideline on it? | |
llvm/include/llvm/IR/IntrinsicsX86.td | ||
5713 | I notice there is no builtin bound to this intrinsic. What is it used for? | |
5731 | ph? | |
llvm/lib/Target/X86/X86InstrFMA3Info.cpp | ||
145 | Looks some redundant logic. Only X86II::EVEX and X86II::T_MAP6 is special for FP16? | |
160 | Can we integrate it to FMA3GROUP_PACKED_AVX512() with PH extended? | |
163 | Ditto. | |
166 | Ditto. | |
170 | Ditto. | |
174 | Seems we only need FP16Groups be separate table. | |
179 | Ditto. | |
llvm/test/CodeGen/X86/avx512fp16-fma-commute.ll | ||
10 | The name 123 is not the same with the generated instruction (213sh). Is it expected? | |
llvm/test/CodeGen/X86/vec-strict-128-fp16.ll | ||
105 | Is it necessary to test 132, 231 version? | |
llvm/test/CodeGen/X86/vec-strict-256-fp16.ll | ||
105 | Ditto. | |
llvm/test/CodeGen/X86/vec-strict-512-fp16.ll | ||
104 | Ditto. |
clang/include/clang/Basic/BuiltinsX86.def | ||
---|---|---|
2014 | The 3 is there because AMD's 4 operand fma used vfmaddss/vfmaddsd. So Intel's 3 operand used vfmaddss3/vfmaddsd3. That naming is being carried forward here. | |
clang/lib/Headers/avx512vlfp16intrin.h | ||
1385 | Ideally FP should never use select because it doesn't convey that exceptions should be masked for strictfp. But the mistake was already made for add/sub/mul/div/fma/etc years ago before strictfp support existed in llvm. fp16 is intentionally following float/double for consistency. | |
llvm/include/llvm/IR/IntrinsicsX86.td | ||
5713 | It is manually selected in CGBuiltin.cpp | |
5731 | It's scalar so it shouldn't be ph. This matches int_x86_avx512_vfmadd_f32 and int_x86_avx512_vfmadd_f64. They don't use ss/sd because the ss/sd names are usually used for intrinsics that 128-bit operands and only modify the lower element. int_x86_avx512_vfmadd_f32 and int_x86_avx512_vfmadd_f64 have float/double inputs and produce float/double results. | |
llvm/test/CodeGen/X86/avx512fp16-fma-commute.ll | ||
10 | 123 represents how the 3 arguments to the fucntion are mapped to the 3 intrinsic arguments that it calls. There are 6 possible permutations which are all tested here, but only 3 instruction mnemonics. |
Address Yuanke's comments. Thanks Yuanke and Craig.
clang/include/clang/Basic/BuiltinsX86.def | ||
---|---|---|
2010 | We followed what're ps/pd doing. As Craig explained, this is history's legacy. We should fix them in future. | |
llvm/test/CodeGen/X86/vec-strict-128-fp16.ll | ||
105 | 213 is the preferred version due to its order in SelectCode table. We can test others by using memory input, but they are covered in stack-folding-fp-avx512fp16vl-fma.ll. I don't think it's necessary to test here. |
Can we arrange the vfmaddph variant together? Move it to line 1997?
Why there is no mask version for 128 and 256?