The code emitted by Clang's intrinsics for (v)cvtsi2ss, (v)cvtsi2sd, (v)cvtsd2ss and (v)cvtss2sd is lowered to a code sequence that includes redundant (v)movss/(v)movsd instructions. This patch adds patterns for optimizing these sequences.
Details
Diff Detail
Event Timeline
The Clang intrinsic for (v)cvtsd2ss is implemented using a builtin (_builtin_ia32_cvtsd2ss) and not lowered to generic IR so we don't see this happening. However looking at the semantics IIUC it seems that:
static __inline__ __m128 __DEFAULT_FN_ATTRS _mm_cvtsd_ss(__m128 __a, __m128d __b) { return (__m128)__builtin_ia32_cvtsd2ss((__v4sf)__a, (__v2df)__b); }
could be also implemented with:
static __inline__ __m128 __DEFAULT_FN_ATTRS _mm_cvtsd_ss(__m128 __a, __m128d __b) { __a[0] =__b[0]; return a; }
Bottom line, I think you are right because the above code doesn't have to come from an intrinsic. I'll add the pattern (I should probably also open a bugzilla for lowering _mm_cvtsd_ss() to generic IR, Right?)
Thanks for the catch!
lib/Target/X86/X86InstrSSE.td | ||
---|---|---|
1962 | For AVX512 I see two types of intrinsics that correspond to these instructions:
| |
1994 | I'll change this. Thanks |
lib/Target/X86/X86InstrSSE.td | ||
---|---|---|
1962 | @craig.topper @igorb Is using the AVX path for AVX512 alright with you guys? |
For AVX-512 we can do the following equivalents
The Int_VCVTSD2SSrr can use VCVTSD2SSZrr. This for some reason doesn't use _Int but does have VR128X register type.
The Int_VCVTSS2SDrr can use VCVTSS2SDZrr. Again it uses VR128X type.
The Int_VCVTSI2SS64rr patterns can use VCVTSI642SSZrr_Int
The Int_VCVTSI2SSrr patterns can use VCVTSI2SSZrr_Int
The Int_VCVTSI2SD64rr patterns can use VCVTSI642SDZrr_Int
The Int_VCVTSI2SDrr patterns can use VCVTSI2SDZrr_Int.
unnecessary is spelled wrong