Enable FP16 binary operator instructions.
Details
Diff Detail
- Repository
 - rG LLVM Github Monorepo
 
Event Timeline
| clang/include/clang/Basic/BuiltinsX86.def | ||
|---|---|---|
| 1860 | Why there is no 256 and 128 version for addph, subph, mulph, divph?  | |
| clang/lib/Headers/avx512fp16intrin.h | ||
| 273 | _CMP_NEQ_OS?  | |
| 279 | Why it is OQ not UQ? Ditto for all other ucomi intrinsics.  | |
| 477 | Why there is rounding control for max/min operation?  | |
| 630 | Will it be combined to one instruction? If __B[0] is 0, and mask[0] is 0, there is no exception?  | |
| 659 | Do we have rounding control for min?  | |
| 718 | This name may be misleading, it means suppress exception. Right?  | |
| 913 | It seems there is no mask for reduce operation.  | |
| 924 | Not sure if there is any room to optimize. The operation for element 2, 3 is unnecessary.  | |
| clang/lib/Headers/avx512vlfp16intrin.h | ||
| 366 | Ditto  | |
| 394 | Ditto.  | |
| clang/test/CodeGen/X86/avx512fp16-builtins.c | ||
| 639 | Need a blank line?  | |
| 645 | Ditto.  | |
- Rebase to the first merged FP16 patch.
 - Address Yuanke's comments.
 - Add parentheses around casts.
 - Remove OptForSize predicate for vmovsh.
 - Add more immediate value for encoding tests of vcmpph/sh.
 
| clang/include/clang/Basic/BuiltinsX86.def | ||
|---|---|---|
| 1860 | Because they don't support rounding control, thus we can directly use instructions like fadd, fsub etc.  | |
| clang/lib/Headers/avx512fp16intrin.h | ||
| 273 | _CMP_NEQ_US is correct, see the operation in intrinsic guide (we should update for sh):  | |
| 279 | OQ requires both inputs are non-nan. This is the same as the intrinsic's behavior: The "u" in intrinsic name corresponds to not "U" but "Q" in the macro name.  | |
| 477 | They still need to control the exception.  | |
| 630 | We should consider this case, but only under strict FP mode. We have ss/sd defined in this way too. We should fix them in future.  | |
| 659 | No. But we need control exception.  | |
| 718 | Yes. But we are widely using it for SAE in existing code :)  | |
Address Yuanke's comments.
| llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
|---|---|---|
| 4838 | No. Because the register size of scalar intrinsics is larger than corresponding memory size:  | |
| llvm/test/CodeGen/X86/avx512fp16-fmaxnum.ll | ||
| 27 | avx512fp16 implies avx512vl.  | |
| clang/lib/Headers/avx512vlfp16intrin.h | ||
|---|---|---|
| 368 | From https://llvm.org/docs/LangRef.html#llvm-vector-reduce-add-intrinsic, -0.0f16 is better?  | |
| llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp | ||
|---|---|---|
| 3197 | Changing this created bug where this is now accepted cmpeqsh %xmm0, %xmm1, %k1  | |
| llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp | ||
| 169 | What about the equivalent code in the Intel printer?  | |
| llvm/lib/Target/X86/X86InstrAVX512.td | ||
| 2674 | This should be synchronized with the CMPSS/CMPSD code.  | |
Why there is no 256 and 128 version for addph, subph, mulph, divph?