- Enable FP16 type support and basic declarations used by following patches.
- Enable new instructions VMOVW and VMOVSH.
- Address Craig's comments.
- Add more patterns for i16 lowering.
There's no difference in assembly for immediate value. https://godbolt.org/z/sMbrM611d. But the latency of vpbroadcastd is better than vpbroadcastw in Skylake according to intrinsic guide. Here the only effect is consist with _mm256_and_epi32. Do you think it's better to use _mm256_set1_epi16?
|290 ↗||(On Diff #356376)|
No. I'll move it to the 3rd patch and test it there.
Maybe we can use X86ScalarSSEf16, here SSE means SSE registers? Especially GCC community proposing to support FP16 since SSE2.
float -> half?
Just be curious, why not directly use __W?
What is may_alias used for?
I see in _mm_mask_load_sh(), we create a __m128h with upper bits zero, not sure we also need it in store intrinsic.
Why not return __a directly?
Any false test case that have padding between a and b?
Not sure about the legacy comments, should it be _Float16 now?
I notice it is true for other extload. Is it same to "true"?
This is the same to ((byte1 & 0x8) == 0x0)?
Add comments for map5 and map6?
Also add it in isCMOVPseudo()?
Drop the brace.
Need check Subtarget.hasFP16()?
Why handle i16? Isn't it handled by movw?
Why exclude f16? Is there better choice for fp16?
Not sure this can be merged to 512 version load/store pattern with muticlass by abstract type info.
Why there is no OptForSize for vmovsh?
Sorry, I forgot what REV stand for. Do you know it?
Given there is only EVEX instructions for fp16, is it necessary to add f16 type to it?
First, this is a simple mimic of _mm_mask_load_ss.
This is used for preventing type-based alias analysis.
"In the context of section 6.5 paragraph 7 of the C99 standard, an lvalue expression dereferencing such a pointer is treated like having a character type."
Both load and store intrinsics only access 16bit memory, the different is the load intrinsic needs to set up the high bits of the XMM register (because we do return a 128 bits result). We don't need to do that for a store.
Because __m128i is defined as <2 x i64>. __a is correct only for i64 type.
This is the one with padding, since _Float16 aligns to 2 bytes while float aligns to 4.
LLVM IR serves for not only one type. __fp16 is still usable in Clang. Besides, OpenCL half type also use half in IR. And maybe we have other FE types too. So I'd like to keep it as is unless we have a better way to cover all other FE types.
Good catch. I noticed it too, but forgot to change it.
Yes, but I'm not sure if this is intentional. Maybe it keeps the shape in & X == X?
customise seems correct too. Anyway, I can change it.
No, f16 is legal here, so it implies the feature.
No, we don't have a movw instruction.
We prefer to using shuffle vector rather than insert_vector_elt here, because we don't have a insert instruction for half type.
I think it is probably feasible. We may add a codegen only opcode to reuse VMOVDQU instruction defination.
Good catch. I think we should add it here.
I think REV is short for revert. Which allows a different encoding when operands order are reverted.
I think so. For example, we may use some i16 instructions which may be or may finally turn into AVX2 ones. Adding to it is useful for them since VR128 is subset of VR128X.
|374 ↗||(On Diff #363946)|
Because we allowed one combine after X86ISelLowering.cpp:41180 without check the feature.
It is short for "reverse". Meaing the operands are in the reversed order. There are two valid encodings moving from one register to another. This happens because there are separate opcodes for moving register to memory(Store) and moving memory to register(load). The memory operand for both of those opcodes can be a register as well. The assembler and isel always uses the register to register version of the load opcode. The reversed version is only used by the disassembler
There is an exception to that. For VEX encoded AVX/AVX2 instructions, X86MCInstLowering will use an _REV move if it allows a 2 byte VEX prefix instead of a 3 byte VEX prefix. This doesn't apply to any AVX512 instructions though.
I understand now. Thanks, Craig and Pengfei.
Sorry, I think we should not add OptForSize here.
Since we don't have a blendph instruction, I think we can always select it to movsh. Not sure if using pblendw is beneficial.