Enable FP16 binary operator instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/include/clang/Basic/BuiltinsX86.def | ||
---|---|---|
1860 | Why there is no 256 and 128 version for addph, subph, mulph, divph? | |
clang/lib/Headers/avx512fp16intrin.h | ||
312 | _CMP_NEQ_OS? | |
318 | Why it is OQ not UQ? Ditto for all other ucomi intrinsics. | |
516 | Why there is rounding control for max/min operation? | |
669 | Will it be combined to one instruction? If __B[0] is 0, and mask[0] is 0, there is no exception? | |
698 | Do we have rounding control for min? | |
757 | This name may be misleading, it means suppress exception. Right? | |
952 | It seems there is no mask for reduce operation. | |
963 | Not sure if there is any room to optimize. The operation for element 2, 3 is unnecessary. | |
clang/lib/Headers/avx512vlfp16intrin.h | ||
366 | Ditto | |
394 | Ditto. | |
clang/test/CodeGen/X86/avx512fp16-builtins.c | ||
639 | Need a blank line? | |
645 | Ditto. |
- Rebase to the first merged FP16 patch.
- Address Yuanke's comments.
- Add parentheses around casts.
- Remove OptForSize predicate for vmovsh.
- Add more immediate value for encoding tests of vcmpph/sh.
clang/include/clang/Basic/BuiltinsX86.def | ||
---|---|---|
1860 | Because they don't support rounding control, thus we can directly use instructions like fadd, fsub etc. | |
clang/lib/Headers/avx512fp16intrin.h | ||
312 | _CMP_NEQ_US is correct, see the operation in intrinsic guide (we should update for sh): | |
318 | OQ requires both inputs are non-nan. This is the same as the intrinsic's behavior: The "u" in intrinsic name corresponds to not "U" but "Q" in the macro name. | |
516 | They still need to control the exception. | |
669 | We should consider this case, but only under strict FP mode. We have ss/sd defined in this way too. We should fix them in future. | |
698 | No. But we need control exception. | |
757 | Yes. But we are widely using it for SAE in existing code :) |
Address Yuanke's comments.
llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
---|---|---|
4838 | No. Because the register size of scalar intrinsics is larger than corresponding memory size: | |
llvm/test/CodeGen/X86/avx512fp16-fmaxnum.ll | ||
27 | avx512fp16 implies avx512vl. |
clang/lib/Headers/avx512vlfp16intrin.h | ||
---|---|---|
368 | From https://llvm.org/docs/LangRef.html#llvm-vector-reduce-add-intrinsic, -0.0f16 is better? |
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp | ||
---|---|---|
3197 | Changing this created bug where this is now accepted cmpeqsh %xmm0, %xmm1, %k1 | |
llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp | ||
169 | What about the equivalent code in the Intel printer? | |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
2670 | This should be synchronized with the CMPSS/CMPSD code. |
Why there is no 256 and 128 version for addph, subph, mulph, divph?