- Enable FP16 type support and basic declarations used by following patches.
- Enable new instructions VMOVW and VMOVSH.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
52 | I think this should be _mm256_undefined_ph(void) | |
62 | I think this should be _mm_undefined_ph(void) | |
66 | I think this should be _mm512_undefined_ph(void) | |
clang/test/CodeGen/X86/avx512fp16-complex.c | ||
1 ↗ | (On Diff #355844) | Can we split _Complex out of this patch? This affects other targets that have _Float16 right? So probably needs a different set of reviewers. |
clang/test/Sema/Float16.c | ||
13 ↗ | (On Diff #355844) | It's odd to change behavior and then have a FIXME asking if the old behavior was correct. |
llvm/lib/Support/X86TargetParser.cpp | ||
204–208 | I think FeaturesICLServer should still be at the beginning of the list. FeatureAVX512FP16 should be alphabetized with the other AVX512 features. Looks like FeatureAVXNNI was already incorrectly alphabetized. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
19022–19023 | I think this comment should mention movsh now. | |
19175 | movsh | |
23188–23191 | This should probably include EltVT==MVT::f16 for the FP16 override? | |
llvm/lib/Target/X86/X86InstrFragmentsSIMD.td | ||
410 | Add a blank line above this to match the original formatting | |
997 | This should be with fp32imm0 and friends. |
Address review comments. Thanks Craig!
clang/test/CodeGen/X86/avx512fp16-complex.c | ||
---|---|---|
1 ↗ | (On Diff #355844) | Sure. Split to D105331. Do you know someone who is familiar with or may be interested in it? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
23188–23191 | Maybe we can only check EltVT == MVT::f16 like this? |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
39 | I'm hesitating not only for the work but also the effect. We have about 1K new intrinsics and more than 5K LOC in total in the two header files. Adding the doxygen comments will make the readability worse and increase the difficulty in review. It's also a burden in maintaining the correctness. | |
llvm/lib/Target/X86/X86Subtarget.h | ||
748 | Make sense. How about hasAVX512FP16? I can update the name as a followup patch once these patches merged. |
llvm/lib/Target/X86/X86Subtarget.h | ||
---|---|---|
748 | That sounds good to me. We should maybe go back and update some of the others. Especially VNNI since we also have AVXVNNI. |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
39 | I completely understand where you're coming from. What we do lose is the ability for code editors to display the doxygen when using the intrinsic (or mouseover the code). Are there any particular intrinsics that we could do with having comments closer at hand - ones that take rounding modes that its tricky to remember the enum/defines for or implicit load/store alignments come to mind? I'm not sure about the idea of linking to external docs for specs - do we have a style guide policy on this? |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
39 |
I only found 3 ones from avx512fintrin.h, anyway, I copied here.
Unfortunately, we didn't add doc for them when enabling avx512 intrinsics.
I was thinking some thing like "See https://llvm.org/LICENSE.txt for license information." in most source files. But I agree doxygen helps for code editors. I didn't think of them simply because I never used them :) Anyway, I guess this is not the block issue for this series patches, right? |
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
255 | 256-bit | |
clang/lib/Headers/avx512vlfp16intrin.h | ||
75 | Why do we use _mm256_set1_epi32 instead of _mm256_set1_epi16? | |
79 | Same question | |
llvm/include/llvm/IR/RuntimeLibcalls.def | ||
290 ↗ | (On Diff #356376) | Is this tested in this patch? |
llvm/lib/Target/X86/X86FastISel.cpp | ||
58 | AVX here should maybe be AVX512, but maybe this is pointing out that this name is bad. Would X86ScalarXMMf* be better? | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
13576 | Drop curly braces on these. |
- Address Craig's comments.
- Add more patterns for i16 lowering.
clang/lib/Headers/avx512vlfp16intrin.h | ||
---|---|---|
75 | There's no difference in assembly for immediate value. https://godbolt.org/z/sMbrM611d. But the latency of vpbroadcastd is better than vpbroadcastw in Skylake according to intrinsic guide. Here the only effect is consist with _mm256_and_epi32. Do you think it's better to use _mm256_set1_epi16? | |
llvm/include/llvm/IR/RuntimeLibcalls.def | ||
290 ↗ | (On Diff #356376) | No. I'll move it to the 3rd patch and test it there. |
llvm/lib/Target/X86/X86FastISel.cpp | ||
58 | Maybe we can use X86ScalarSSEf16, here SSE means SSE registers? Especially GCC community proposing to support FP16 since SSE2. |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
3405 | float -> half? | |
clang/lib/Headers/avx512fp16intrin.h | ||
292 | Just be curious, why not directly use __W? | |
319 | What is may_alias used for? | |
350 | I see in _mm_mask_load_sh(), we create a __m128h with upper bits zero, not sure we also need it in store intrinsic. | |
419 | Why not return __a[0] directly? | |
clang/test/CodeGen/X86/avx512fp16-abi.c | ||
89 | Any false test case that have padding between a and b? | |
llvm/include/llvm/IR/Intrinsics.td | ||
315 | Not sure about the legacy comments, should it be _Float16 now? | |
llvm/include/llvm/Target/TargetSelectionDAG.td | ||
1054 | I notice it is true for other extload. Is it same to "true"? | |
llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp | ||
341 | This is the same to ((byte1 & 0x8) == 0x0)? |
llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp | ||
---|---|---|
801 | Add comments for map5 and map6? | |
llvm/lib/Target/X86/X86.td | ||
189 | customize? | |
llvm/lib/Target/X86/X86FastISel.cpp | ||
2291 | Also add it in isCMOVPseudo()? | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
1946 | Drop the brace. | |
10549 | Need check Subtarget.hasFP16()? | |
10551 | Why handle i16? Isn't it handled by movw? | |
10744 | Why exclude f16? Is there better choice for fp16? | |
19023 | movss/movsh |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
---|---|---|
82 | indent | |
3878 | Not sure this can be merged to 512 version load/store pattern with muticlass by abstract type info. | |
4159 | Why there is no OptForSize for vmovsh? | |
4478 | Sorry, I forgot what REV stand for. Do you know it? | |
llvm/lib/Target/X86/X86RegisterInfo.td | ||
570 | Given there is only EVEX instructions for fp16, is it necessary to add f16 type to it? | |
572 | Ditto. |
Thanks Yuanke.
clang/lib/Headers/avx512fp16intrin.h | ||
---|---|---|
292 | First, this is a simple mimic of _mm_mask_load_ss. | |
319 | This is used for preventing type-based alias analysis. "In the context of section 6.5 paragraph 7 of the C99 standard, an lvalue expression dereferencing such a pointer is treated like having a character type." | |
350 | Both load and store intrinsics only access 16bit memory, the different is the load intrinsic needs to set up the high bits of the XMM register (because we do return a 128 bits result). We don't need to do that for a store. | |
419 | Because __m128i is defined as <2 x i64>. __a[0] is correct only for i64 type. | |
clang/test/CodeGen/X86/avx512fp16-abi.c | ||
89 | This is the one with padding, since _Float16 aligns to 2 bytes while float aligns to 4. | |
llvm/include/llvm/IR/Intrinsics.td | ||
315 | LLVM IR serves for not only one type. __fp16 is still usable in Clang. Besides, OpenCL half type also use half in IR. And maybe we have other FE types too. So I'd like to keep it as is unless we have a better way to cover all other FE types. | |
llvm/include/llvm/Target/TargetSelectionDAG.td | ||
1054 | Good catch. I noticed it too, but forgot to change it. | |
llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp | ||
341 | Yes, but I'm not sure if this is intentional. Maybe it keeps the shape in & X == X? | |
llvm/lib/Target/X86/X86.td | ||
189 | customise seems correct too. Anyway, I can change it. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
10549 | No, f16 is legal here, so it implies the feature. | |
10551 | No, we don't have a movw instruction. | |
10744 | We prefer to using shuffle vector rather than insert_vector_elt here, because we don't have a insert instruction for half type. | |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
3878 | I think it is probably feasible. We may add a codegen only opcode to reuse VMOVDQU instruction defination. | |
4159 | Good catch. I think we should add it here. | |
4478 | I think REV is short for revert. Which allows a different encoding when operands order are reverted. | |
llvm/lib/Target/X86/X86RegisterInfo.td | ||
570 | I think so. For example, we may use some i16 instructions which may be or may finally turn into AVX2 ones. Adding to it is useful for them since VR128 is subset of VR128X. | |
llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll | ||
374 | Because we allowed one combine after X86ISelLowering.cpp:41180 without check the feature. |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
---|---|---|
4478 | It is short for "reverse". Meaing the operands are in the reversed order. There are two valid encodings moving from one register to another. This happens because there are separate opcodes for moving register to memory(Store) and moving memory to register(load). The memory operand for both of those opcodes can be a register as well. The assembler and isel always uses the register to register version of the load opcode. The reversed version is only used by the disassembler There is an exception to that. For VEX encoded AVX/AVX2 instructions, X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 instructions though. |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
---|---|---|
4478 | Thanks Craig for the information. |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
---|---|---|
4478 |
I understand now. Thanks, Craig and Pengfei. |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
3479 | For 2 float, return <2xfloat> to be compatible to previous ABI? |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
3479 | It is already handled in line 3456. |
llvm/lib/Target/X86/X86InstrAVX512.td | ||
---|---|---|
4159 | Sorry, I think we should not add OptForSize here. Since we don't have a blendph instruction, I think we can always select it to movsh. Not sure if using pblendw is beneficial. |
I was tracking back a recent ABI break (also failing now in gcc 12, so maybe this irregularity is intentional), and was concerned that this commit is observed to cause the platform ABI to change depending on the feature flags of the current compilation unit. Prior to this change, f16 was always treated as i16 for the purpose of the calling-convention (e.g. returned in %ax). But after this change, the ABI of the value is now inconsistent between compile units. I made a small change to one of the existing tests to show this. Note how the callq result was in %ax without this mattr flag, and in %xmm0 with this mattr flag added. But the function known as "identity.half" is external, and did not change between those two calls to the llvm.
diff --git a/llvm/test/CodeGen/X86/half.ll b/llvm/test/CodeGen/X86/half.ll index 46179e7d9113..8c1b8c4b76ff 100644 --- a/llvm/test/CodeGen/X86/half.ll +++ b/llvm/test/CodeGen/X86/half.ll @@ -5,6 +5,8 @@ ; RUN: | FileCheck %s -check-prefixes=CHECK,CHECK-LIBCALL,BWOFF ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+f16c -fixup-byte-word-insts=1 \ ; RUN: | FileCheck %s -check-prefixes=CHECK,BWON,BWON-F16C +; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512fp16 -fixup-byte-word-insts=0 \ +; RUN: | FileCheck %s -check-prefixes=CHECK-CC ; RUN: llc < %s -mtriple=i686-unknown-linux-gnu -mattr +sse2 -fixup-byte-word-insts=0 \ ; RUN: | FileCheck %s -check-prefixes=CHECK-I686 @@ -163,16 +199,31 @@ define void @test_trunc32(float %in, half* %addr) #0 { ret void } +declare half @identity.half(half) + define void @test_trunc64(double %in, half* %addr) #0 { ; CHECK-LABEL: test_trunc64: ; CHECK: # %bb.0: ; CHECK-NEXT: pushq %rbx ; CHECK-NEXT: movq %rdi, %rbx ; CHECK-NEXT: callq __truncdfhf2@PLT +; CHECK-NEXT: # kill: def $ax killed $ax def $eax +; CHECK-NEXT: movl %eax, %edi +; CHECK-NEXT: callq identity.half@PLT ; CHECK-NEXT: movw %ax, (%rbx) ; CHECK-NEXT: popq %rbx ; CHECK-NEXT: retq ; +; CHECK-CC-LABEL: test_trunc64: +; CHECK-CC: # %bb.0: +; CHECK-CC-NEXT: pushq %rbx +; CHECK-CC-NEXT: movq %rdi, %rbx +; CHECK-CC-NEXT: vcvtsd2sh %xmm0, %xmm0, %xmm0 +; CHECK-CC-NEXT: callq identity.half@PLT +; CHECK-CC-NEXT: vmovsh %xmm0, (%rbx) +; CHECK-CC-NEXT: popq %rbx +; CHECK-CC-NEXT: retq +; ; CHECK-I686-LABEL: test_trunc64: ; CHECK-I686: # %bb.0: ; CHECK-I686-NEXT: pushl %esi @@ -181,12 +232,16 @@ define void @test_trunc64(double %in, half* %addr) #0 { ; CHECK-I686-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero ; CHECK-I686-NEXT: movsd %xmm0, (%esp) ; CHECK-I686-NEXT: calll __truncdfhf2 +; CHECK-I686-NEXT: # kill: def $ax killed $ax def $eax +; CHECK-I686-NEXT: movl %eax, (%esp) +; CHECK-I686-NEXT: calll identity.half@PLT ; CHECK-I686-NEXT: movw %ax, (%esi) ; CHECK-I686-NEXT: addl $8, %esp ; CHECK-I686-NEXT: popl %esi ; CHECK-I686-NEXT: retl %val16 = fptrunc double %in to half - store half %val16, half* %addr + %val16b = call half @identity.half(half %val16) + store half %val16b, half* %addr ret void }
Is this intentional? We do already have code to handle the ABI dependency on vector-sizes, and could add this to the list of flags that change the ABI (i.e. we disable it if it will break the ABI), but wanted to confirm first if that was the intent here.
discovered from https://github.com/JuliaLang/julia/issues/44829
Thanks @vtjnash for the information! Comments on https://github.com/JuliaLang/julia/issues/44829
Might be worth mentioning that it requires AVX512FP16 here