Enable FP16 conversion instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
1748 | Does it also return i32 in x86_64 platform? We may unify the intrinsic both for x86 and x86_x64 to return i32. | |
1874 | VCVTPH2PSX support broadcast compared to VCVTPH2PS, but for intrinsics there is no difference. Do we need to add the new intrinsics? Ditto for its variants. |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
953 | Put extra parentheses here to avoid repeating PR51324. Same for any macro that has a typecast of builtin result. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1955 | Sorry, I'm just confused on why the type is the same for ISD::SINT_TO_FP and ISD::FP_TO_SINT? The legalization use src type for ISD::SINT_TO_FP and dst type for ISD::FP_TO_SINT? Why not unify to dst type. | |
1996 | How do we know it covert to v16f16? Is it possible convert to v16f32? | |
2054 | Why it is not v2i16? | |
19998 | Should this node be chained to Op.getOperand(0) for strict FP and convert operation be chained to this node? | |
22003 | InChain for strict FP? | |
22014 | Is there any case for v3f16? | |
31087 | Is it possible the type is i3/i5/i7? | |
31265 | Where is vXi16 handle? Is it promoted to vXi32 finally? | |
31280 | Isn't the result type changed to v8f16? Why we don't extract sub-vector here? | |
49327 | Need to check Subtarget.hasFP16() ? | |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
8193 | Why null_frag instead of X86vfpround? | |
8199 | What is the alias used for? Can't it be distinguished from operand? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1955 | This is the rules for how LegalizeDAG.cpp and LegalizeVectorOps looks up the operation action for these nodes. It always use the integer type. I believe this is because for scalar types, there are more legal integer types than there are legal conversions. The FP types were already checked for legality by the type legalizer. This has gotten a lot more complicated for vectors. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
31087 | We should only get here for integer type that we enabled Custom handling for. | |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
8193 | The masked forms require X86vmfpround and the unmasked forms require X86any_vfpround. Template doesn't handle that. So there are extra patterns later | |
8199 | The memory form can't be distinquished by operand and requires a suffix. This alias exists so that the register form can also use a suffix even though it isn't required. It allows an "rm" constraint to be used for a suffixed mnemonic in inline assembly without needing to know if the compiler will pick register or memory. |
- Address Yuanke's comments.
- Add missed strict FP handling.
- Refactor the repeated declarations for strict FP.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1996 | No. Because v16f32 is not a legal type on X86. | |
2054 | This is used to customize vector widen, which always check the action of result type. | |
19998 | Not, we just chain FP nodes together. | |
22003 | Good catch. | |
22014 | No, v3f16 will be widen to v4f16 first. | |
31265 | No. i16 and f16 has the same element size. So we don't need to replace them with custom nodes. | |
31280 | Yes. The common widen code widen both src and dst elements as the same size. We are customizing to different size here so than we can always select the 128 bit instructions. Result type larger than 128 bits doesn't have this problem. | |
49327 | No. We can't go to here without feature FP16 enabled. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
1996 | Sorry, I mistook it with v32f32. |
I suspect this error from this or D105331
https://lab.llvm.org/buildbot/#/builders/85/builds/6132
Thanks @vitalybuka for the information. I didn't receive this buildbot failure notice. I found the latest build has turned green, but I didn't find which commit fixed it. I'll keep watching it for a while.
Put extra parentheses here to avoid repeating PR51324. Same for any macro that has a typecast of builtin result.